On Tue, Feb 14, 2017 at 06:13:20PM +1100, Blair Bethwaite wrote:
Hi all,
In IRC last night Dan helpfully confirmed my analysis of an issue we are
seeing attempting to launch high memory KVM guests backed by hugepages...
In this case the guests have 240GB of memory allocated from two host NUMA
nodes to two guest NUMA nodes. The trouble is that allocating the hugepage
backed qemu process seems to take longer than the 30s QEMU_JOB_WAIT_TIME
and so libvirt then most unhelpfully kills the barely spawned guest. Dan
said there was currently no workaround available so I'm now looking at
building a custom libvirt which sets QEMU_JOB_WAIT_TIME=60s.
I have two related questions:
1) will this change have any untoward side-effects?
2) if not, then is there any reason not to change it in master until a
better solution comes along (or possibly better, alter
qemuDomainObjBeginJobInternal
to give a domain start job a little longer compared to other jobs)?
What is the actual error you're getting during startup.
I'm not entirely sure QEMU_JOB_WAIT_TIME is the thing that's the problem.
IIRC, the job wait time only comes into play when 2 threads are contending
on the same QEMU process. ie one has an existing job running and a second
comes along and tries to run a second job. The second will timeout after
the QEMU_JOB_WAIT_TIME is reached. The first job which holds the lock will
never timeout.
During guest startup I didn't believe we had contending jobs in this
way - all the jobs needed to startup QEMU should be serialized, so
I'm not sure why QEMU_JOB_WAIT_TIME would even get hit.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://entangle-photo.org -o-
http://search.cpan.org/~danberr/ :|