Sent: Tuesday, October 24, 2023 at 5:28 PM
From: "Martin Kletzander" <mkletzan(a)redhat.com>
To: "daggs" <daggs(a)gmx.com>
Cc: libvir-list(a)redhat.com
Subject: Re: hdd kills vm
On Mon, Oct 23, 2023 at 04:59:08PM +0200, daggs wrote:
>Greetings Martin,
>
>> Sent: Sunday, October 22, 2023 at 12:37 PM
>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>> To: "daggs" <daggs(a)gmx.com>
>> Cc: libvir-list(a)redhat.com
>> Subject: Re: hdd kills vm
>>
>> On Fri, Oct 20, 2023 at 02:42:38PM +0200, daggs wrote:
>> >Greetings,
>> >
>> >I have a windows 11 vm running on my Gentoo using libvirt (9.8.0) + qemu
(8.1.2), I'm passing almost all available resources to the vm
>> >(all 16 cpus, 31 out of 32 GB, nVidia gpu is pt), but the performance is not
good, system lags, takes long time to boot.
>>
>> There are couple of things that stand out to me in your setup and I'll
>> assume the host has one NUMA node with 8 cores, each with 2 threads as,
>> just like you set it up in the guest XML.
>thats correct, see:
>$ lscpu | grep -i numa
>NUMA node(s): 1
>NUMA node0 CPU(s): 0-15
>
>however:
>$ dmesg | grep -i numa
>[ 0.003783] No NUMA configuration found
>
>can that be the reason?
>
no, this is fine, 1 NUMA node is not a NUMA, technically, so that's
perfectly fine.
thanks for clarifying it for me
>>
>> * When you give the guest all the CPUs the host has there is nothing
>> left to run the host tasks. You might think that there "isn't
>> anything running", but there is, if only your init system, the kernel
>> and the QEMU which is emulating the guest. This is definitely one of
>> the bottlenecks.
>I've tried with 12 out of 16, same behavior.
>
>>
>> * The pinning of vCPUs to CPUs is half-suspicious. If you are trying to
>> make vCPU 0 and 1 be threads on the same core and on the host the
>> threads are represented as CPUs 0 and 8, then that's fine. If that is
>> just copy-pasted from somewhere, then it might not reflect the current
>> situation and can be source of many scheduling issues (even once the
>> above is dealt with).
>I found a site that does it for you, if it is wrong, can you point me to a place I
can read about it?
>
Just check what the topology is on the host and try to match it with the
guest one. If in doubt, then try it without the pinning.
I can try to play with
it, what I don't know is what should be the mapping logic?
>>
>> * I also seem to recall that Windows had some issues with systems that
>> have too many cores. I'm not sure whether that was an issue with an
>> edition difference or just with some older versions, or if it just did
>> not show up in the task manager, but there was something that was
>> fixed by using either more sockets or cores in the topology. This is
>> probably not the issue for you though.
>>
>> >after trying a few ways to fix it, I've concluded that the issue might
be related to the why the hdd is defined at the vm level.
>> >here is the xml:
https://bpa.st/MYTA
>> >I assume that the hdd sits on the sata ctrl causing the issue but I'm
not sure what is the proper way to fix it, any ideas?
>> >
>>
>> It looks like your disk is on SATA, but I don't see why that would be an
>> issue. Passing the block device to QEMU as VirtIO shouldn't cause that
>> much of a difference. Try measuring the speed of the disk on the host
>> and then in the VM maybe. Is that SSD or NVMe? I presume that's not
>> spinning rust, is it.
>as seen, I have 3 drives, 2 cdroms as sata and one hdd pt as virtio, I read somewhere
that if the controller of the virtio
>device is sata, than it doesn't uses the virtio optimally.
Well it _might_ be slightly more beneficial to use virtio-scsi or even
<disk type='block' device='lun'>, but I can't imagine that
would make
the system lag. I'm not that familiar with the details.
configure virtio-scsi
and sata-scai at the same time?
>it is a spindle, nvmes are too expensive where I live, frankly, I don't need
lightning fast boot, the other BM machines running windows on spindle
>run it quite fast and they aren't half as fast as this server
>
That might actually be related. The guest might think it is a different
type of disk and use completely suboptimal scheduling. This might
actually be solved by passing it as <disk device='lun'..., but at this
point I'm just guessing.
I'll look into that, thanks.
>>
>> >Thanks,
>> >
>> >Dagg.
>> >
>>
>