[RFC] exposing 'nodedev assigned to domain' info to users

Hi, This is something I've been giving a thought after working in Gitlab issue #72 and decided to run through the ML before hitting the code. We don't have an easy way to retrieve the domain that is using an specific hostdev. Let's say that I want to know which domain is using the PCI card pci_0000_01_00_2. 'nodedev-dumpxml' will return the hardware/driver capabilities of the device, such as IOMMU group, driver and so on, but will not inform which domain is using the hostdev, if any. 'nodedev-list' will simply list all nodedev names known to Libvirt, without outputting any other information. IIUC, the only existing way I can reliably tell whether a hostdev is being used by domain, aside from having to register the information by myself during domain definition of course, is via 'virsh dumpxml <domain>' each existing running domain and matching the nodedev name with the source.address element of the XML. When we consider SR-IOV devices that can have 28+ VFs each (and have lots of fun caveats, like Github #72 showed us), the capability of hot plug/unplug hostdevs freely, and lots of running domains, it is clear that we're putting a considerable pressure in the upper layers (OVirt, or a poor human admin) to keep track of the nodedevs each running domain is using. An info that we already have internally and can just expose it. I have a few ideas to make this happen: 1 - upgrade 'nodedev-list' to add an extra 'assigned to' column This is the more straightforward way of exposing the info. A simple 'nodedev-list' call can retrieve which domain is using which nodedev. To preserve the existing usage we can add an "--show-assigned-domains" option to control whether we will display this info. 2 - add an '<assigned_to>' element in nodedev XML definition I'm not a fan of exposing this in this particular XML because we would mix host/hw related attributes with domain info. But it would be easier to pull this off comparing to (1), so I'm mentioning it for the record. I would start by exposing the info for HOSTDEV_SUBSYS_TYPE_PCI hostdevs (--cap pci in nodedev-list). Thanks, DHB

On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
Hi,
This is something I've been giving a thought after working in Gitlab issue #72 and decided to run through the ML before hitting the code.
We don't have an easy way to retrieve the domain that is using an specific hostdev. Let's say that I want to know which domain is using the PCI card pci_0000_01_00_2. 'nodedev-dumpxml' will return the hardware/driver capabilities of the device, such as IOMMU group, driver and so on, but will not inform which domain is using the hostdev, if any. 'nodedev-list' will simply list all nodedev names known to Libvirt, without outputting any other information.
IIUC, the only existing way I can reliably tell whether a hostdev is being used by domain, aside from having to register the information by myself during domain definition of course, is via 'virsh dumpxml <domain>' each existing running domain and matching the nodedev name with the source.address element of the XML.
When we consider SR-IOV devices that can have 28+ VFs each (and have lots of fun caveats, like Github #72 showed us), the capability of hot plug/unplug hostdevs freely, and lots of running domains, it is clear that we're putting a considerable pressure in the upper layers (OVirt, or a poor human admin) to keep track of the nodedevs each running domain is using. An info that we already have internally and can just expose it.
I have a few ideas to make this happen:
1 - upgrade 'nodedev-list' to add an extra 'assigned to' column
This is the more straightforward way of exposing the info. A simple 'nodedev-list' call can retrieve which domain is using which nodedev. To preserve the existing usage we can add an "--show-assigned-domains" option to control whether we will display this info.
That would mean nodedev-list has to fetch XML for every running guest and parse and extract it. That's not a scalable solution.
2 - add an '<assigned_to>' element in nodedev XML definition
I'm not a fan of exposing this in this particular XML because we would mix host/hw related attributes with domain info. But it would be easier to pull this off comparing to (1), so I'm mentioning it for the record.
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present. The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 1/6/21 7:09 AM, Daniel P. Berrangé wrote:
On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
Hi,
This is something I've been giving a thought after working in Gitlab issue #72 and decided to run through the ML before hitting the code.
We don't have an easy way to retrieve the domain that is using an specific hostdev. Let's say that I want to know which domain is using the PCI card pci_0000_01_00_2. 'nodedev-dumpxml' will return the hardware/driver capabilities of the device, such as IOMMU group, driver and so on, but will not inform which domain is using the hostdev, if any. 'nodedev-list' will simply list all nodedev names known to Libvirt, without outputting any other information.
IIUC, the only existing way I can reliably tell whether a hostdev is being used by domain, aside from having to register the information by myself during domain definition of course, is via 'virsh dumpxml <domain>' each existing running domain and matching the nodedev name with the source.address element of the XML.
When we consider SR-IOV devices that can have 28+ VFs each (and have lots of fun caveats, like Github #72 showed us), the capability of hot plug/unplug hostdevs freely, and lots of running domains, it is clear that we're putting a considerable pressure in the upper layers (OVirt, or a poor human admin) to keep track of the nodedevs each running domain is using. An info that we already have internally and can just expose it.
I have a few ideas to make this happen:
1 - upgrade 'nodedev-list' to add an extra 'assigned to' column
This is the more straightforward way of exposing the info. A simple 'nodedev-list' call can retrieve which domain is using which nodedev. To preserve the existing usage we can add an "--show-assigned-domains" option to control whether we will display this info.
That would mean nodedev-list has to fetch XML for every running guest and parse and extract it. That's not a scalable solution.
2 - add an '<assigned_to>' element in nodedev XML definition
I'm not a fan of exposing this in this particular XML because we would mix host/hw related attributes with domain info. But it would be easier to pull this off comparing to (1), so I'm mentioning it for the record.
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present.
The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs. Yet another alternative is a new API under "Device Commands". We already have attach-device, detach-device and so on, might as well have a new "list-devices" that does the deed. This fits with the "The following commands manipulate devices associated to domains." claim that we make about this class of commands. Thanks, DHB
Regards, Daniel

On Wed, Jan 06, 2021 at 08:00:52AM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 7:09 AM, Daniel P. Berrangé wrote:
On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
Hi,
This is something I've been giving a thought after working in Gitlab issue #72 and decided to run through the ML before hitting the code.
We don't have an easy way to retrieve the domain that is using an specific hostdev. Let's say that I want to know which domain is using the PCI card pci_0000_01_00_2. 'nodedev-dumpxml' will return the hardware/driver capabilities of the device, such as IOMMU group, driver and so on, but will not inform which domain is using the hostdev, if any. 'nodedev-list' will simply list all nodedev names known to Libvirt, without outputting any other information.
IIUC, the only existing way I can reliably tell whether a hostdev is being used by domain, aside from having to register the information by myself during domain definition of course, is via 'virsh dumpxml <domain>' each existing running domain and matching the nodedev name with the source.address element of the XML.
When we consider SR-IOV devices that can have 28+ VFs each (and have lots of fun caveats, like Github #72 showed us), the capability of hot plug/unplug hostdevs freely, and lots of running domains, it is clear that we're putting a considerable pressure in the upper layers (OVirt, or a poor human admin) to keep track of the nodedevs each running domain is using. An info that we already have internally and can just expose it.
I have a few ideas to make this happen:
1 - upgrade 'nodedev-list' to add an extra 'assigned to' column
This is the more straightforward way of exposing the info. A simple 'nodedev-list' call can retrieve which domain is using which nodedev. To preserve the existing usage we can add an "--show-assigned-domains" option to control whether we will display this info.
That would mean nodedev-list has to fetch XML for every running guest and parse and extract it. That's not a scalable solution.
2 - add an '<assigned_to>' element in nodedev XML definition
I'm not a fan of exposing this in this particular XML because we would mix host/hw related attributes with domain info. But it would be easier to pull this off comparing to (1), so I'm mentioning it for the record.
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present.
The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs.
Wasn't this about the connection to the nodedev though? E.g. with mdevs we only have a UUID in the domain XML which doesn't tell you anything about the device nor its parent and you also can't take the uuid and try finding the corresponding nodedev entry for it (well, you can hack it so that you construct the resulting nodedev name). Maybe I'm just misunderstanding the use case though. Erik
Yet another alternative is a new API under "Device Commands". We already have attach-device, detach-device and so on, might as well have a new "list-devices" that does the deed. This fits with the "The following commands manipulate devices associated to domains." claim that we make about this class of commands.
Thanks,
DHB
Regards, Daniel

On 1/6/21 8:13 AM, Erik Skultety wrote:
On Wed, Jan 06, 2021 at 08:00:52AM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 7:09 AM, Daniel P. Berrangé wrote:
On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
Hi,
This is something I've been giving a thought after working in Gitlab issue #72 and decided to run through the ML before hitting the code.
We don't have an easy way to retrieve the domain that is using an specific hostdev. Let's say that I want to know which domain is using the PCI card pci_0000_01_00_2. 'nodedev-dumpxml' will return the hardware/driver capabilities of the device, such as IOMMU group, driver and so on, but will not inform which domain is using the hostdev, if any. 'nodedev-list' will simply list all nodedev names known to Libvirt, without outputting any other information.
IIUC, the only existing way I can reliably tell whether a hostdev is being used by domain, aside from having to register the information by myself during domain definition of course, is via 'virsh dumpxml <domain>' each existing running domain and matching the nodedev name with the source.address element of the XML.
When we consider SR-IOV devices that can have 28+ VFs each (and have lots of fun caveats, like Github #72 showed us), the capability of hot plug/unplug hostdevs freely, and lots of running domains, it is clear that we're putting a considerable pressure in the upper layers (OVirt, or a poor human admin) to keep track of the nodedevs each running domain is using. An info that we already have internally and can just expose it.
I have a few ideas to make this happen:
1 - upgrade 'nodedev-list' to add an extra 'assigned to' column
This is the more straightforward way of exposing the info. A simple 'nodedev-list' call can retrieve which domain is using which nodedev. To preserve the existing usage we can add an "--show-assigned-domains" option to control whether we will display this info.
That would mean nodedev-list has to fetch XML for every running guest and parse and extract it. That's not a scalable solution.
2 - add an '<assigned_to>' element in nodedev XML definition
I'm not a fan of exposing this in this particular XML because we would mix host/hw related attributes with domain info. But it would be easier to pull this off comparing to (1), so I'm mentioning it for the record.
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present.
The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs.
Wasn't this about the connection to the nodedev though? E.g. with mdevs we only have a UUID in the domain XML which doesn't tell you anything about the device nor its parent and you also can't take the uuid and try finding the corresponding nodedev entry for it (well, you can hack it so that you construct the resulting nodedev name). Maybe I'm just misunderstanding the use case though.
This particular case I'm asking for comments is related to PCI hostdevs (namely, SR-IOV virtual functions) that might get removed from the host, while being assigned to a running domain. We don't support that (albeit I posted patches that tries to alleviate the issue in Libvirt), and at the same time we don't provide easy tools for the user to check whether a specific hostdev is assigned to a domain. The user must query the running domains to find out. About mdevs, isn't a mdev device created on demand via hypercalls to the physical device driver, and threw away after use, the only real device being the parent? I'm not sure whether there is a use case/requirement for knowing the parent nodedev device. The parent device can be retrieved via sysfs AFAIC. Thanks, DHB
Erik
Yet another alternative is a new API under "Device Commands". We already have attach-device, detach-device and so on, might as well have a new "list-devices" that does the deed. This fits with the "The following commands manipulate devices associated to domains." claim that we make about this class of commands.
Thanks,
DHB
Regards, Daniel

On Wed, Jan 06, 2021 at 02:24:35PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 8:13 AM, Erik Skultety wrote:
On Wed, Jan 06, 2021 at 08:00:52AM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 7:09 AM, Daniel P. Berrangé wrote:
On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
Hi,
This is something I've been giving a thought after working in Gitlab issue #72 and decided to run through the ML before hitting the code.
We don't have an easy way to retrieve the domain that is using an specific hostdev. Let's say that I want to know which domain is using the PCI card pci_0000_01_00_2. 'nodedev-dumpxml' will return the hardware/driver capabilities of the device, such as IOMMU group, driver and so on, but will not inform which domain is using the hostdev, if any. 'nodedev-list' will simply list all nodedev names known to Libvirt, without outputting any other information.
IIUC, the only existing way I can reliably tell whether a hostdev is being used by domain, aside from having to register the information by myself during domain definition of course, is via 'virsh dumpxml <domain>' each existing running domain and matching the nodedev name with the source.address element of the XML.
When we consider SR-IOV devices that can have 28+ VFs each (and have lots of fun caveats, like Github #72 showed us), the capability of hot plug/unplug hostdevs freely, and lots of running domains, it is clear that we're putting a considerable pressure in the upper layers (OVirt, or a poor human admin) to keep track of the nodedevs each running domain is using. An info that we already have internally and can just expose it.
I have a few ideas to make this happen:
1 - upgrade 'nodedev-list' to add an extra 'assigned to' column
This is the more straightforward way of exposing the info. A simple 'nodedev-list' call can retrieve which domain is using which nodedev. To preserve the existing usage we can add an "--show-assigned-domains" option to control whether we will display this info.
That would mean nodedev-list has to fetch XML for every running guest and parse and extract it. That's not a scalable solution.
2 - add an '<assigned_to>' element in nodedev XML definition
I'm not a fan of exposing this in this particular XML because we would mix host/hw related attributes with domain info. But it would be easier to pull this off comparing to (1), so I'm mentioning it for the record.
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present.
The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs.
Wasn't this about the connection to the nodedev though? E.g. with mdevs we only have a UUID in the domain XML which doesn't tell you anything about the device nor its parent and you also can't take the uuid and try finding the corresponding nodedev entry for it (well, you can hack it so that you construct the resulting nodedev name). Maybe I'm just misunderstanding the use case though.
This particular case I'm asking for comments is related to PCI hostdevs (namely, SR-IOV virtual functions) that might get removed from the host, while being assigned to a running domain. We don't support that (albeit I posted patches that tries to alleviate the issue in Libvirt), and at the same time we don't provide easy tools for the user to check whether a specific hostdev is assigned to a domain. The user must query the running domains to find out.
This isn't all that much different to other host resources that are given to guests. eg if pinning vCPUs 1:1 to pCPUs, the admin/mgmt app has to keep track of which pCPUs are used. If assuming host block devices to a guest, the admin/mgmt app has to keep track of block devices in use. If assigning NICs for dedicated guest use the admin/mgmt app has to keep track. etc, etc. Apps like oVirt, OpenStack, KubeVirt will all do this tracking themselves generally. This is especially important when they need to have this usage information kept on a separate host so that the schedular can use it when deciding which host to place a new guest on. So, I'm not entirely convinced libvirt needs has a critical need to do anything for PCI devices in this respect.
About mdevs, isn't a mdev device created on demand via hypercalls to the physical device driver, and threw away after use, the only real device being the parent? I'm not sure whether there is a use case/requirement for knowing the parent nodedev device. The parent device can be retrieved via sysfs AFAIC.
We shouldn't assume mdevs are created on demand. It is reasonable to precreate those which are needed at initial machine bootup. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 1/6/21 2:30 PM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 02:24:35PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 8:13 AM, Erik Skultety wrote:
On Wed, Jan 06, 2021 at 08:00:52AM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 7:09 AM, Daniel P. Berrangé wrote:
On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
[...]
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present.
The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs.
Wasn't this about the connection to the nodedev though? E.g. with mdevs we only have a UUID in the domain XML which doesn't tell you anything about the device nor its parent and you also can't take the uuid and try finding the corresponding nodedev entry for it (well, you can hack it so that you construct the resulting nodedev name). Maybe I'm just misunderstanding the use case though.
This particular case I'm asking for comments is related to PCI hostdevs (namely, SR-IOV virtual functions) that might get removed from the host, while being assigned to a running domain. We don't support that (albeit I posted patches that tries to alleviate the issue in Libvirt), and at the same time we don't provide easy tools for the user to check whether a specific hostdev is assigned to a domain. The user must query the running domains to find out.
This isn't all that much different to other host resources that are given to guests. eg if pinning vCPUs 1:1 to pCPUs, the admin/mgmt app has to keep track of which pCPUs are used. If assuming host block devices to a guest, the admin/mgmt app has to keep track of block devices in use. If assigning NICs for dedicated guest use the admin/mgmt app has to keep track. etc, etc.
Apps like oVirt, OpenStack, KubeVirt will all do this tracking themselves generally. This is especially important when they need to have this usage information kept on a separate host so that the schedular can use it when deciding which host to place a new guest on.
So, I'm not entirely convinced libvirt needs has a critical need to do anything for PCI devices in this respect.
I agree that whether we implement this or not, this is a feature 'good to have' at best, that just the average admin that has access to a SR-IOV card and doesn't have OVirt like apps to manage the VMs will end up using. Not sure how many ppl out there that fits this profile TBH. Definitely nothing that warrants breaking thing to implement. Thanks, DHB
About mdevs, isn't a mdev device created on demand via hypercalls to the physical device driver, and threw away after use, the only real device being the parent? I'm not sure whether there is a use case/requirement for knowing the parent nodedev device. The parent device can be retrieved via sysfs AFAIC.
We shouldn't assume mdevs are created on demand. It is reasonable to precreate those which are needed at initial machine bootup.
Regards, Daniel

On Wed, Jan 06, 2021 at 02:40:15PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 2:30 PM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 02:24:35PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 8:13 AM, Erik Skultety wrote:
On Wed, Jan 06, 2021 at 08:00:52AM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 7:09 AM, Daniel P. Berrangé wrote:
On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
[...]
This is similar to what we do for the nwfilter-binding and net-port XML where we have an <owner> element present.
The complication here is that right now we don't ever touch the nodedev driver when doing host device assignment, and so don't especially want to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs.
Wasn't this about the connection to the nodedev though? E.g. with mdevs we only have a UUID in the domain XML which doesn't tell you anything about the device nor its parent and you also can't take the uuid and try finding the corresponding nodedev entry for it (well, you can hack it so that you construct the resulting nodedev name). Maybe I'm just misunderstanding the use case though.
This particular case I'm asking for comments is related to PCI hostdevs (namely, SR-IOV virtual functions) that might get removed from the host, while being assigned to a running domain. We don't support that (albeit I posted patches that tries to alleviate the issue in Libvirt), and at the same time we don't provide easy tools for the user to check whether a specific hostdev is assigned to a domain. The user must query the running domains to find out.
This isn't all that much different to other host resources that are given to guests. eg if pinning vCPUs 1:1 to pCPUs, the admin/mgmt app has to keep track of which pCPUs are used. If assuming host block devices to a guest, the admin/mgmt app has to keep track of block devices in use. If assigning NICs for dedicated guest use the admin/mgmt app has to keep track. etc, etc.
Apps like oVirt, OpenStack, KubeVirt will all do this tracking themselves generally. This is especially important when they need to have this usage information kept on a separate host so that the schedular can use it when deciding which host to place a new guest on.
So, I'm not entirely convinced libvirt needs has a critical need to do anything for PCI devices in this respect.
I agree that whether we implement this or not, this is a feature 'good to have' at best, that just the average admin that has access to a SR-IOV card and doesn't have OVirt like apps to manage the VMs will end up using. Not sure how many ppl out there that fits this profile TBH.
Definitely nothing that warrants breaking thing to implement.
For the adhoc use case we don't especially need to know which VM is using a PCI devices. We just need to know that the device is in use or not. We know if a PCI device is in use because it will be bound to a specific kernel driver whenever assigned. Could we use this perhaps as a way to filter the list of node devs to only show those which are not assigned Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 1/6/21 3:05 PM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 02:40:15PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 2:30 PM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 02:24:35PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 8:13 AM, Erik Skultety wrote:
On Wed, Jan 06, 2021 at 08:00:52AM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 7:09 AM, Daniel P. Berrangé wrote: > On Tue, Jan 05, 2021 at 05:18:13PM -0300, Daniel Henrique Barboza wrote:
[...]
> > This is similar to what we do for the nwfilter-binding and net-port XML > where we have an <owner> element present. > > The complication here is that right now we don't ever touch the nodedev > driver when doing host device assignment, and so don't especially want > to introduce a dependancy.
One possible alternative would be a new API that operates on hostdevs instead of nodedevs. "hostdev-list" would list the devices assigned to any domain, as opposed to "nodedev-list" that lists all nodedevs of the host. I'm not sure if this differentiation between hostdev and nodedev (i.e. hostdev is a nodedev that is assigned to a domain) would be clear enough to users though. We would need to document it clearer in the docs.
Wasn't this about the connection to the nodedev though? E.g. with mdevs we only have a UUID in the domain XML which doesn't tell you anything about the device nor its parent and you also can't take the uuid and try finding the corresponding nodedev entry for it (well, you can hack it so that you construct the resulting nodedev name). Maybe I'm just misunderstanding the use case though.
This particular case I'm asking for comments is related to PCI hostdevs (namely, SR-IOV virtual functions) that might get removed from the host, while being assigned to a running domain. We don't support that (albeit I posted patches that tries to alleviate the issue in Libvirt), and at the same time we don't provide easy tools for the user to check whether a specific hostdev is assigned to a domain. The user must query the running domains to find out.
This isn't all that much different to other host resources that are given to guests. eg if pinning vCPUs 1:1 to pCPUs, the admin/mgmt app has to keep track of which pCPUs are used. If assuming host block devices to a guest, the admin/mgmt app has to keep track of block devices in use. If assigning NICs for dedicated guest use the admin/mgmt app has to keep track. etc, etc.
Apps like oVirt, OpenStack, KubeVirt will all do this tracking themselves generally. This is especially important when they need to have this usage information kept on a separate host so that the schedular can use it when deciding which host to place a new guest on.
So, I'm not entirely convinced libvirt needs has a critical need to do anything for PCI devices in this respect.
I agree that whether we implement this or not, this is a feature 'good to have' at best, that just the average admin that has access to a SR-IOV card and doesn't have OVirt like apps to manage the VMs will end up using. Not sure how many ppl out there that fits this profile TBH.
Definitely nothing that warrants breaking thing to implement.
For the adhoc use case we don't especially need to know which VM is using a PCI devices. We just need to know that the device is in use or not.
We know if a PCI device is in use because it will be bound to a specific kernel driver whenever assigned. Could we use this perhaps as a way to filter the list of node devs to only show those which are not assigned
Interesting. Making 'nodedev-list' showing which PCI nodedevs are assigned/unassigned via sysfs is already a good info to have, and we don't create any new dependency in the nodedev driver. I'll investigate it. Thanks, DHB
Regards, Daniel
participants (3)
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Erik Skultety