Re: dm-crypt performance regression due to workqueue changes

Monday, 1 July 2024

On Sun, Jun 30, 2024 at 08:49:48PM +0200, Mikulas Patocka wrote:
...

 On Sun, 30 Jun 2024, Tejun Heo wrote:

 > Hello,
 > 
 > On Sat, Jun 29, 2024 at 08:15:56PM +0200, Mikulas Patocka wrote:
 > 
 > > With 6.5, we get 3600MiB/s; with 6.6 we get 1400MiB/s.
 > > 
 > > The reason is that virt-manager by default sets up a topology where we 
 > > have 16 sockets, 1 core per socket, 1 thread per core. And that workqueue 
 > > patch avoids moving work items across sockets, so it processes all 
 > > encryption work only on one virtual CPU.

 > > The performance degradation may be fixed with "echo 'system'
 > > >/sys/module/workqueue/parameters/default_affinity_scope" - but it is 
 > > regression anyway, as many users don't know about this option.
 > > 
 > > How should we fix it? There are several options:
 > > 1. revert back to 'numa' affinity
 > > 2. revert to 'numa' affinity only if we are in a virtual machine
 > > 3. hack dm-crypt to set the 'numa' affinity for the affected
workqueues
 > > 4. any other solution?
 > 
 > Do you happen to know why libvirt is doing that? There are many other
 > implications to configuring the system that way and I don't think we want to
 > design kernel behaviors to suit topology information fed to VMs which can be
 > arbitrary.
 > 
 > Thanks.

 I don't know why. I added users(a)lists.libvirt.org to the CC.

 How should libvirt properly advertise "we have 16 threads that are 
 dynamically scheduled by the host kernel, so the latencies between them 
 are changing and unpredictable"? 
NB, libvirt is just control plane, the actual virtual hardware exposed
is implemented across QEMU and the KVM kernel mod. Guest CPU topology
and/or NUMA cost information is the responsibility of QEMU.

When QEMU's virtual CPUs are floating freely across host CPUs there's
no perfect answer. The host admin needs to make a tradeoff in their
configuration

They can optimize for density, by allowing guest CPUs to float freely
and allow CPU overcommit against host CPUs, and the guest CPU topology
is essentially a lie.

They can optimize for predictable performance, by strictly pinning
guest CPUs 1:1 to host CPUs, and minimize CPU overcommit, and have
the guest CPU topology 1:1 match the host CPU topology.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: dm-crypt performance regression due to workqueue changes