Greetings Laine,
Sent: Monday, August 16, 2021 at 12:57 AM
From: "Laine Stump" <laine(a)redhat.com>
To: "daggs" <daggs(a)gmx.com>
Cc: "Martin Kletzander" <mkletzan(a)redhat.com>, libvirt-users(a)redhat.com
Subject: Re: issues with vm after upgrade
On 8/14/21 6:05 AM, daggs wrote:
> Greetings Martin,
>
>> Sent: Thursday, August 12, 2021 at 2:07 PM
>> From: "daggs" <daggs(a)gmx.com>
>> To: "Martin Kletzander" <mkletzan(a)redhat.com>
>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>> Subject: Re: issues with vm after upgrade
>>
>>> Sent: Thursday, August 12, 2021 at 11:49 AM
>>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>>> To: "daggs" <daggs(a)gmx.com>
>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>> Subject: Re: issues with vm after upgrade
>>>
>>> On Wed, Aug 11, 2021 at 08:53:10PM +0200, daggs wrote:
>>>> Greetings Martin,
>>>>
>>>>
>>>>> Sent: Wednesday, August 11, 2021 at 6:08 PM
>>>>> From: "daggs" <daggs(a)gmx.com>
>>>>> To: "Martin Kletzander" <mkletzan(a)redhat.com>
>>>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>>>> Subject: Re: issues with vm after upgrade
>>>>>
>>>>> Greetings Martin,
>>>>>
>>>>>> Sent: Wednesday, August 11, 2021 at 4:13 PM
>>>>>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>>>>>> To: "daggs" <daggs(a)gmx.com>
>>>>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>
>>>>>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
>>>>>>> Greetings Martin,
>>>>>>>
>>>>>>>> Sent: Wednesday, August 11, 2021 at 10:14 AM
>>>>>>>> From: "Martin Kletzander"
<mkletzan(a)redhat.com>
>>>>>>>> To: "daggs" <daggs(a)gmx.com>
>>>>>>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>>>
>>>
>>> [...]
>>>
>>>>>>>>
>>>>>>>> 2) To your issue with starting the domain it would be
good to know what
>>>>>>>> is the error you get from virsh (or however you are
starting the
>>>>>>>> domain) and the debug logs of libvirtd, ideally
just for the part of
>>>>>>>> the domain starting.
>>>>>>> that is the issue, there wasn't any error. the vm just
didn't booted.
>>>>>>
>>>>>> Oh, so I misunderstood. What was the state of the VM in
libvirt?
>>>>>> "paused" or "running"? Was there serial
console working?
>>>>> it was marked as running and there was no serial
>>>>>
>>>
>>> That's a pity we could not examine what was actually happening.
>>>
>>>>>>
>>>>>>> I can diff the original xml with the new one to see the
diffs and post them here if you wish
>>>>>>>
>>>>>>
>>>>>> Would be nice to see if there are any differences. The newly
created
>>>>>> one works then?
>>>>>>
>>>>>
>>>>> I'll sent it later today
>>>>>
>>>>
>>>> here:
https://dpaste.com/5VBUU8Z9W
>>>>
>>>
>>> Unfortunately there are many differences there. The machine type
>>> changes _something_ in qemu, there is different PCI(e) topology, and I
>>> do not think I will be able to figure this out without the non-working
>>> machine.
>>>
>>> So if your current setup works for you right now I'd leave figuring out
>>> the previous issue to others, if there is anyone wanting to figure out
>>> if there is some libvirt issue.
>>>
>>> Have a nice day
>>>
>>
>> my current setup works beside the hdmi audio, this I still need to investigate.
>>
>> thanks for your help.
>>
>> Dagg
>>
>
> just to update, I've solved the sound issue, frankly, I don't understand how
the guest showed a soundcard in the first place.
> from what I gather, libvirt sets the -nodefaults flag to prepare the vm's
properties from scratch.
> in this situation, the sound card is a function in the host machine's pci tree.
> when libvirt created the pci tree for the guest, it placed the card as a function of
a device as well, in my case 02:00.2
> however it didn't created a device at 02:00.0.
Are you basing this claim on the libvirt XML? Or on what you see with
lspci in the guest?
When libvirt is assigning PCI addresses to devices in a guest, it will
never auto-assign a non-0 function. This will only happen if the user
explicitly requests it (and even then, iirc, libvirt should generate an
error if function 0 of the same slot has no device - something to the
effect of "no device on function 0 of a multifunction device").
Anyway, when I looked back at the XML diff you posted earlier (see
below), I didn't see any hostdev device assigned to 02:00.2. What I
*did* see was that in both the old and the new version of the diff, the
hostdev devices were assigned to function 0 of different *slots* on a
dmi-to-pci-bridge controller, which should cause no problems (unless
there is a bug in QEMU's dmi-to-pci-bridge). (The important thing,
though, is that there is no hostdev device on a non-0 function, and when
it is on a non-0 slot, that's because it's on a dmi-to-pci-bridge (which
has 32 slots).
I saw it in guest, I'd assume that if libvirt defines a device
on a specific bdf, the guest will not change it.
infact, over the last 10 years I've booted thousand of systems both bare metal and
visualized and never encountered such scenario.
that said, it might be a bug in qemu.
what I did saw is that on the old vm in guest, after the upgrade the sound card was
defined as a function of the scsi virtblk controller and the new vm placed
it as a function of non existent device.
On the topic of having a dmi-to-pci-bridge show up in your XML: I don't
remember what versions the changes were in (it was at least a year or
two ago), but only a fairly old version of libvirt woud do that - 1)
recent libvirt will assume that any hostdev PCI device is a PCIe device,
so it will add a pcie-root-port and assign the hostdev device to slot 0
of that root-port, and even before that 2) we switched from using
dmi-to-pci-bridge to using pcie-to-pci-bridge quite some time ago as well.
as
stated in the original mail, the issue started after a major version upgrade of both
libvirt and qemu,
I'm currently using latest stable afaik.
So if you're generating new XML based on config that doesn't have pci
controllers already in it, and you're seeing hostdevs (or any other PCI
devices) assigned to an automatically-added dmi-to-pci-bridge, then your
libvirt version is severely out of date.
here are the version I'm using:
# emerge --search app-emulation/libvirt app-emulation/qemu
[ Results for search key : app-emulation/libvirt ]
Searching...
* app-emulation/libvirt
Latest version available: 7.5.0
Latest version installed: 7.5.0
Size of files: 9749 KiB
Homepage:
https://www.libvirt.org/ https://gitlab.com/libvirt/libvirt/
Description: C toolkit to manipulate virtual machines
License: LGPL-2.1
[ Applications found : 1 ]
[ Results for search key : app-emulation/qemu ]
Searching...
* app-emulation/qemu
Latest version available: 6.0.0-r52
Latest version installed: 6.0.0-r52
Size of files: 22724 KiB
Homepage:
http://www.qemu.org http://www.linux-kvm.org
Description: QEMU + Kernel-based Virtual Machine userland tools
License: GPL-2 LGPL-2 BSD-2
[ Applications found : 1 ]
On 8/11/21 2:53 PM, daggs wrote:
>> From: "daggs" <daggs(a)gmx.com>
>>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
>>>> I can diff the original xml with the new one to see the diffs and
post them here if you wish
>>>>
>>>
>>> Would be nice to see if there are any differences. The newly created
>>> one works then?
>>
>> I'll sent it later today
>>
>
> here:
https://dpaste.com/5VBUU8Z9W
> my fix was to move the device to 00:1f.4 in the guest.
That's an interesting choice :-). You could have just put it on function
0 of some other unused slot (or a non-0 function of the slot the GPU is
assigned to). 00:1f is used for integrated devices on the Q35 chipset -
it's nice that QEMU's emulation code was written to allowing adding more
devices on that slot, but I wouldn't have been surprised if it had
caused problems...
10 years of working in a virtualization company has taught me
that somethings, keeping the pci structure close as much as possible
to the original is the best way to go.
that is why I chose it a s func, it is a func on the host mahcine.
> I won't be surprised this was the issue why the vm didn't booted after the
upgrade with the old xml.
Well, if your XML had a device assigned to a non-0 function of a slot
and no device in function 0 of that slot, it would have failed to work
previously as well (my recollection is that in this case it's more a
problem of the guest OS not probing non-0 functions when there is
nothing on function 0, and not with anything done by QEMU).
here is the xml of the machine after I've recreated it, it worked but no sound:
https://dpaste.com/BB9EDY6BK
I used virt-manager. note that the sound card pt is placed as a func in bus 0x8 which
doesn't exists.
Dagg.