On 8/14/21 6:05 AM, daggs wrote:
Greetings Martin,
> Sent: Thursday, August 12, 2021 at 2:07 PM
> From: "daggs" <daggs(a)gmx.com>
> To: "Martin Kletzander" <mkletzan(a)redhat.com>
> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
> Subject: Re: issues with vm after upgrade
>
>> Sent: Thursday, August 12, 2021 at 11:49 AM
>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>> To: "daggs" <daggs(a)gmx.com>
>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>> Subject: Re: issues with vm after upgrade
>>
>> On Wed, Aug 11, 2021 at 08:53:10PM +0200, daggs wrote:
>>> Greetings Martin,
>>>
>>>
>>>> Sent: Wednesday, August 11, 2021 at 6:08 PM
>>>> From: "daggs" <daggs(a)gmx.com>
>>>> To: "Martin Kletzander" <mkletzan(a)redhat.com>
>>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>>> Subject: Re: issues with vm after upgrade
>>>>
>>>> Greetings Martin,
>>>>
>>>>> Sent: Wednesday, August 11, 2021 at 4:13 PM
>>>>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>>>>> To: "daggs" <daggs(a)gmx.com>
>>>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>>>> Subject: Re: issues with vm after upgrade
>>>>>
>>>>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
>>>>>> Greetings Martin,
>>>>>>
>>>>>>> Sent: Wednesday, August 11, 2021 at 10:14 AM
>>>>>>> From: "Martin Kletzander"
<mkletzan(a)redhat.com>
>>>>>>> To: "daggs" <daggs(a)gmx.com>
>>>>>>> Cc: dan(a)berrange.com, libvirt-users(a)redhat.com
>>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>>
>>
>> [...]
>>
>>>>>>>
>>>>>>> 2) To your issue with starting the domain it would be good to
know what
>>>>>>> is the error you get from virsh (or however you are
starting the
>>>>>>> domain) and the debug logs of libvirtd, ideally just for
the part of
>>>>>>> the domain starting.
>>>>>> that is the issue, there wasn't any error. the vm just
didn't booted.
>>>>>
>>>>> Oh, so I misunderstood. What was the state of the VM in libvirt?
>>>>> "paused" or "running"? Was there serial console
working?
>>>> it was marked as running and there was no serial
>>>>
>>
>> That's a pity we could not examine what was actually happening.
>>
>>>>>
>>>>>> I can diff the original xml with the new one to see the diffs and
post them here if you wish
>>>>>>
>>>>>
>>>>> Would be nice to see if there are any differences. The newly
created
>>>>> one works then?
>>>>>
>>>>
>>>> I'll sent it later today
>>>>
>>>
>>> here:
https://dpaste.com/5VBUU8Z9W
>>>
>>
>> Unfortunately there are many differences there. The machine type
>> changes _something_ in qemu, there is different PCI(e) topology, and I
>> do not think I will be able to figure this out without the non-working
>> machine.
>>
>> So if your current setup works for you right now I'd leave figuring out
>> the previous issue to others, if there is anyone wanting to figure out
>> if there is some libvirt issue.
>>
>> Have a nice day
>>
>
> my current setup works beside the hdmi audio, this I still need to investigate.
>
> thanks for your help.
>
> Dagg
>
just to update, I've solved the sound issue, frankly, I don't understand how the
guest showed a soundcard in the first place.
from what I gather, libvirt sets the -nodefaults flag to prepare the vm's properties
from scratch.
in this situation, the sound card is a function in the host machine's pci tree.
when libvirt created the pci tree for the guest, it placed the card as a function of a
device as well, in my case 02:00.2
however it didn't created a device at 02:00.0.
Are you basing this claim on the libvirt XML? Or on what you see with
lspci in the guest?
When libvirt is assigning PCI addresses to devices in a guest, it will
never auto-assign a non-0 function. This will only happen if the user
explicitly requests it (and even then, iirc, libvirt should generate an
error if function 0 of the same slot has no device - something to the
effect of "no device on function 0 of a multifunction device").
Anyway, when I looked back at the XML diff you posted earlier (see
below), I didn't see any hostdev device assigned to 02:00.2. What I
*did* see was that in both the old and the new version of the diff, the
hostdev devices were assigned to function 0 of different *slots* on a
dmi-to-pci-bridge controller, which should cause no problems (unless
there is a bug in QEMU's dmi-to-pci-bridge). (The important thing,
though, is that there is no hostdev device on a non-0 function, and when
it is on a non-0 slot, that's because it's on a dmi-to-pci-bridge (which
has 32 slots).
On the topic of having a dmi-to-pci-bridge show up in your XML: I don't
remember what versions the changes were in (it was at least a year or
two ago), but only a fairly old version of libvirt woud do that - 1)
recent libvirt will assume that any hostdev PCI device is a PCIe device,
so it will add a pcie-root-port and assign the hostdev device to slot 0
of that root-port, and even before that 2) we switched from using
dmi-to-pci-bridge to using pcie-to-pci-bridge quite some time ago as well.
So if you're generating new XML based on config that doesn't have pci
controllers already in it, and you're seeing hostdevs (or any other PCI
devices) assigned to an automatically-added dmi-to-pci-bridge, then your
libvirt version is severely out of date.
On 8/11/21 2:53 PM, daggs wrote:
> From: "daggs" <daggs(a)gmx.com>
>> From: "Martin Kletzander" <mkletzan(a)redhat.com>
>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
>>> I can diff the original xml with the new one to see the diffs and
post
them here if you wish
>>>
>>
>> Would be nice to see if there are any differences. The newly created
>> one works then?
>
> I'll sent it later today
>
here:
https://dpaste.com/5VBUU8Z9W
my fix was to move the device to 00:1f.4 in the guest.
That's an interesting choice :-). You could have just put it on function
0 of some other unused slot (or a non-0 function of the slot the GPU is
assigned to). 00:1f is used for integrated devices on the Q35 chipset -
it's nice that QEMU's emulation code was written to allowing adding more
devices on that slot, but I wouldn't have been surprised if it had
caused problems...
I won't be surprised this was the issue why the vm didn't
booted after the upgrade with the old xml.
Well, if your XML had a device assigned to a non-0 function of a slot
and no device in function 0 of that slot, it would have failed to work
previously as well (my recollection is that in this case it's more a
problem of the guest OS not probing non-0 functions when there is
nothing on function 0, and not with anything done by QEMU).