Re: [libvirt-users] How can a bridge be optimized?

On Tue, Jun 12, 2012 at 11:35:28AM +0400, Andrey Korolyov wrote:
Just a stupid question: did you pin guest vcpus or NIC` hw queues? Unpinned vm may seriously affect I/O performance when running on same core set as NIC(hwraid/fc/etc).
Thanks. No question is stupid. Obviously, this shouldn't be so slow at VM I/O so I'm missing something. In my defense, there is no single coherent set of documents on this stuff, unless those are kept in a secret place. It would be a fine thing if a few of the people who know all the "obvious" stuff about libvirt-based KVM configuration would collaborate and document it fully somewhere. When I Google "pinned vcpu" all the top responses are about Xen. I run KVM. I find mention that "KVM uses the linux scheduler for distributing workload rather than actually assigning physical CPUs to VMs," at http://serverfault.com/questions/235143/can-i-provision-half-a-core-as-a-vir.... Is that wrong? Are you suggesting RAID will end up on one CPU core, and if the VM ends up on the same core there's I/O trouble? I can see that for software RAID. But we're running hardware RAID. Isn't that handled in the hardware of the RAID controller? Isn't that the point of hardware RAID, to keep it off the CPU? In any case the slow IO is true of all the VMs, so presumably they're properly spread out over CPU cores (of which there are more than VMs, and each VM is configured to take only one), which would rule out the general problem being the shared use of any one core by other system processes. Whit

On Tue, Jun 12, 2012 at 08:07:44AM -0400, Whit Blauvelt wrote:
On Tue, Jun 12, 2012 at 11:35:28AM +0400, Andrey Korolyov wrote:
Just a stupid question: did you pin guest vcpus or NIC` hw queues? Unpinned vm may seriously affect I/O performance when running on same core set as NIC(hwraid/fc/etc).
Thanks. No question is stupid. Obviously, this shouldn't be so slow at VM I/O so I'm missing something. In my defense, there is no single coherent set of documents on this stuff, unless those are kept in a secret place. It would be a fine thing if a few of the people who know all the "obvious" stuff about libvirt-based KVM configuration would collaborate and document it fully somewhere.
When I Google "pinned vcpu" all the top responses are about Xen. I run KVM. I find mention that "KVM uses the linux scheduler for distributing workload rather than actually assigning physical CPUs to VMs," at http://serverfault.com/questions/235143/can-i-provision-half-a-core-as-a-vir....
It is possible to pin to pCPUs in KVM using the same XML syntax as with Xen. First you can specify the overall VM's CPU affinity: <vcpu cpuset='1,2'/>4</vcpu> This creates a 4 vCPU guest, where all the KVM threads are restricted to pCPUs 1 & 2. You can go further are lock down individual vCPUs to individual pCPUs by adding something like this: <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='1'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='2'/> </cputune> While it is true that in general the Linux scheduler does a reasonable job of managing vCPUs, it is definitely possible to improve things by using explicit pinning, particularly if you are producing formal benchmarks. The downside of pinning is that if you have very variable workloads, you may lower your overall utilization by not letting the kernel move threads about on demand. If your host machine has multiple NUMA nodes, it is well worth trying to pin a VM as a whole (<vcpu>) so that it fits inside 1 single NUMA node, but leave individual vCPUs free to float on pCPUs in that single NUMA node. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Jun 12, 2012 at 4:07 PM, Whit Blauvelt <whit@transpect.com> wrote:
On Tue, Jun 12, 2012 at 11:35:28AM +0400, Andrey Korolyov wrote:
Just a stupid question: did you pin guest vcpus or NIC` hw queues? Unpinned vm may seriously affect I/O performance when running on same core set as NIC(hwraid/fc/etc).
Thanks. No question is stupid. Obviously, this shouldn't be so slow at VM I/O so I'm missing something. In my defense, there is no single coherent set of documents on this stuff, unless those are kept in a secret place. It would be a fine thing if a few of the people who know all the "obvious" stuff about libvirt-based KVM configuration would collaborate and document it fully somewhere.
When I Google "pinned vcpu" all the top responses are about Xen. I run KVM. I find mention that "KVM uses the linux scheduler for distributing workload rather than actually assigning physical CPUs to VMs," at http://serverfault.com/questions/235143/can-i-provision-half-a-core-as-a-vir.... Is that wrong? Are you suggesting RAID will end up on one CPU core, and if the VM ends up on the same core there's I/O trouble? I can see that for software RAID. But we're running hardware RAID. Isn't that handled in the hardware of the RAID controller? Isn't that the point of hardware RAID, to keep it off the CPU?
In any case the slow IO is true of all the VMs, so presumably they're properly spread out over CPU cores (of which there are more than VMs, and each VM is configured to take only one), which would rule out the general problem being the shared use of any one core by other system processes.
Whit
You partially right, HW raid do cpu offload, but it still need an amount of interrupts proportional to load, and same for server NICs with hardware queues - they help to manage data outside cpu, so cpu spent less time doing the job for peripherals, but not too much. As for my experience, any VM(xen|qemu) should be pinned to a different set of cores than pointed by smp_affinity for rx|tx queues and other peripherals, if not - depending on overall system load, you may get even freeze(my experience with crappy LSI MegaSAS from time when 2.6.18 rocks) and almost every time network benchmark lost a large portion of throughput. Another side point when doing such tune - use your NUMA topology to achieve best performance - e.g. do NOT pin NIC` interrupts on neighbor socket` cores or assign ten cores of one VM to six real ones mixed between two NUMA nodes.

On Tue, Jun 12, 2012 at 04:50:37PM +0400, Andrey Korolyov wrote:
You partially right, HW raid do cpu offload, but it still need an amount of interrupts proportional to load, and same for server NICs with hardware queues - they help to manage data outside cpu, so cpu spent less time doing the job for peripherals, but not too much. As for my experience, any VM(xen|qemu) should be pinned to a different set of cores than pointed by smp_affinity for rx|tx queues and other peripherals, if not - depending on overall system load, you may get even freeze(my experience with crappy LSI MegaSAS from time when 2.6.18 rocks) and almost every time network benchmark lost a large portion of throughput. Another side point when doing such tune - use your NUMA topology to achieve best performance - e.g. do NOT pin NIC` interrupts on neighbor socket` cores or assign ten cores of one VM to six real ones mixed between two NUMA nodes.
Thanks Andrey. That gives me a lot of things to look up. Not that I'm asking for answers here. Just noting it in case anyone does get around to writing comprehensive docs on this. How do I learn the NUMA topology? Where do I see where smp_affinity points? With VMs each configured to have only on virtual core (as they are in my case) to what degree are these complications less in play? What is the method to pin NIC interrupts? Again, not suggesting such answers belong in this list. But if anyone were to write a book on this stuff, those of us coming to virtualized systems from actual ones won't necessarily have ever been concerned with such questions. The existing introductory texts on libvirt, KVM and QEMU don't, IIRC, mention these things. Best, Whit
participants (4)
-
Andrey Korolyov
-
Daniel P. Berrange
-
Whit Blauvelt
-
Whit Blauvelt