On Tue, Jun 12, 2012 at 4:07 PM, Whit Blauvelt <whit(a)transpect.com> wrote:
On Tue, Jun 12, 2012 at 11:35:28AM +0400, Andrey Korolyov wrote:
> Just a stupid question: did you pin guest vcpus or NIC` hw queues?
> Unpinned vm may seriously affect I/O performance when running on same
> core set as NIC(hwraid/fc/etc).
Thanks. No question is stupid. Obviously, this shouldn't be so slow at VM
I/O so I'm missing something. In my defense, there is no single coherent set
of documents on this stuff, unless those are kept in a secret place. It
would be a fine thing if a few of the people who know all the "obvious"
stuff about libvirt-based KVM configuration would collaborate and document
it fully somewhere.
When I Google "pinned vcpu" all the top responses are about Xen. I run KVM.
I find mention that "KVM uses the linux scheduler for distributing workload
rather than actually assigning physical CPUs to VMs," at
http://serverfault.com/questions/235143/can-i-provision-half-a-core-as-a-....
Is that wrong? Are you suggesting RAID will end up on one CPU core, and if
the VM ends up on the same core there's I/O trouble? I can see that for
software RAID. But we're running hardware RAID. Isn't that handled in the
hardware of the RAID controller? Isn't that the point of hardware
RAID, to keep it off the CPU?
In any case the slow IO is true of all the VMs, so presumably they're
properly spread out over CPU cores (of which there are more than VMs, and
each VM is configured to take only one), which would rule out the general
problem being the shared use of any one core by other system processes.
Whit
You partially right, HW raid do cpu offload, but it still need an
amount of interrupts proportional to load, and same for server NICs
with hardware queues - they help to manage data outside cpu, so cpu
spent less time doing the job for peripherals, but not too much. As
for my experience, any VM(xen|qemu) should be pinned to a different
set of cores than pointed by smp_affinity for rx|tx queues and other
peripherals, if not - depending on overall system load, you may get
even freeze(my experience with crappy LSI MegaSAS from time when
2.6.18 rocks) and almost every time network benchmark lost a large
portion of throughput. Another side point when doing such tune - use
your NUMA topology to achieve best performance - e.g. do NOT pin NIC`
interrupts on neighbor socket` cores or assign ten cores of one VM to
six real ones mixed between two NUMA nodes.