[libvirt] [Question] nodedev-list --cap return nothing after creating a mdev

Hi, I meet a problem that libvirt handle new mediated devices wrongly. Command `virsh nodedev-list -cap mdev` return none after I create a mdev, and the root cause I found is: // kernel code int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) { ... ret = device_register(&mdev->dev); // send udev event here --- 1 if (ret) { put_device(&mdev->dev); goto create_err; } ret = mdev_device_create_ops(kobj, mdev); if (ret) goto create_failed; ret = mdev_create_sysfs_files(&mdev->dev, type); // create mdev_type symbol link here --- 2 if (ret) { mdev_device_remove_ops(mdev, true); goto create_failed; } mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); ... } At point 1, kernel send a udev event(add). If libvirt receive this event and handle it before kernel code reach point 2, libvirt just fail to resolve link such as '/sys/devices/pci0000:80/0000:80:02.0/0000:81:00.0/36b79575-ada8-4406-893d-8cfd8d10a984/mdev_type' because the link haven't been created yet. Then libvirt thinks this devices doesn't exsit and stop tracking info of this device. Finally, command `vish nodedev-list -cap` return nothing though the device exist at host indeed. To make it easy to reproduce this problem, I just add a `udelay(1)` before point 2 by kprobe. So I think maybe we should add 'kobject_uevent(..., KOBJ_ADD)' after point 2 instead of point 1 ? Or is there any other method to solve this problem? Thanks, Zongyong Wu

On Thu, Feb 01, 2018 at 07:19:45AM +0000, Wuzongyong (Euler Dept) wrote:
Hi,
I meet a problem that libvirt handle new mediated devices wrongly. Command `virsh nodedev-list -cap mdev` return none after I create a mdev, and the root cause I found is:
// kernel code int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) { ... ret = device_register(&mdev->dev); // send udev event here --- 1 if (ret) { put_device(&mdev->dev); goto create_err; }
ret = mdev_device_create_ops(kobj, mdev); if (ret) goto create_failed;
ret = mdev_create_sysfs_files(&mdev->dev, type); // create mdev_type symbol link here --- 2 if (ret) { mdev_device_remove_ops(mdev, true); goto create_failed; }
mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); ... }
At point 1, kernel send a udev event(add). If libvirt receive this event and handle it before kernel code reach point 2, libvirt just fail to resolve link such as '/sys/devices/pci0000:80/0000:80:02.0/0000:81:00.0/36b79575-ada8-4406-893d-8cfd8d10a984/mdev_type' because the link haven't been created yet. Then libvirt thinks this devices doesn't exsit and stop tracking info of this device. Finally, command `vish nodedev-list -cap` return nothing though the device exist at host indeed.
To make it easy to reproduce this problem, I just add a `udelay(1)` before point 2 by kprobe.
So I think maybe we should add 'kobject_uevent(..., KOBJ_ADD)' after point 2 instead of point 1 ? Or is there any other method to solve this problem?
Hi, just a note, we worked around this in upstream libvirt already. Thanks for the pointer though, I'd also be interested in possible solutions in kernel. Erik

Could you please share what the commit id is ? Thanks, Zongyong Wu
-----Original Message----- From: Erik Skultety [mailto:eskultet@redhat.com] Sent: Thursday, February 01, 2018 4:09 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com> Cc: Alex Williamson <alex.williamson@redhat.com>; libvir-list@redhat.com; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt][Question] nodedev-list --cap return nothing after creating a mdev
Hi,
I meet a problem that libvirt handle new mediated devices wrongly. Command `virsh nodedev-list -cap mdev` return none after I create a mdev, and the root cause I found is:
// kernel code int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) { ... ret = device_register(&mdev->dev); // send udev event here --- 1 if (ret) { put_device(&mdev->dev); goto create_err; }
ret = mdev_device_create_ops(kobj, mdev); if (ret) goto create_failed;
ret = mdev_create_sysfs_files(&mdev->dev, type); // create mdev_type symbol link here --- 2 if (ret) { mdev_device_remove_ops(mdev, true); goto create_failed; }
mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); ... }
At point 1, kernel send a udev event(add). If libvirt receive this event and handle it before kernel code reach point 2, libvirt just fail to resolve link such as '/sys/devices/pci0000:80/0000:80:02.0/0000:81:00.0/36b79575-ada8-4406- 893d-8cfd8d10a984/mdev_type' because the link haven't been created yet. Then libvirt thinks this devices doesn't exsit and stop tracking info of this device. Finally, command `vish nodedev-list -cap` return nothing though the device exist at host indeed.
To make it easy to reproduce this problem, I just add a `udelay(1)` before point 2 by kprobe.
So I think maybe we should add 'kobject_uevent(..., KOBJ_ADD)' after
On Thu, Feb 01, 2018 at 07:19:45AM +0000, Wuzongyong (Euler Dept) wrote: point 2 instead of point 1 ?
Or is there any other method to solve this problem?
Hi, just a note, we worked around this in upstream libvirt already.
Thanks for the pointer though, I'd also be interested in possible solutions in kernel.
Erik

Hi, I solve this problem by sending a CHANGE event to userspace after creating the symbol link like sriov driver. diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 1269910..a23f1d7 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -351,6 +351,8 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) goto create_failed; } + kobject_uevent(&(mdev->dev.kobj), KOBJ_CHANGE); + mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); Is this patch ok? Thanks, Zongyong Wu
-----Original Message----- From: Erik Skultety [mailto:eskultet@redhat.com] Sent: Thursday, February 01, 2018 4:09 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com> Cc: Alex Williamson <alex.williamson@redhat.com>; libvir-list@redhat.com; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt][Question] nodedev-list --cap return nothing after creating a mdev
Hi,
I meet a problem that libvirt handle new mediated devices wrongly. Command `virsh nodedev-list -cap mdev` return none after I create a mdev, and the root cause I found is:
// kernel code int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) { ... ret = device_register(&mdev->dev); // send udev event here --- 1 if (ret) { put_device(&mdev->dev); goto create_err; }
ret = mdev_device_create_ops(kobj, mdev); if (ret) goto create_failed;
ret = mdev_create_sysfs_files(&mdev->dev, type); // create mdev_type symbol link here --- 2 if (ret) { mdev_device_remove_ops(mdev, true); goto create_failed; }
mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); ... }
At point 1, kernel send a udev event(add). If libvirt receive this event and handle it before kernel code reach point 2, libvirt just fail to resolve link such as '/sys/devices/pci0000:80/0000:80:02.0/0000:81:00.0/36b79575-ada8-4406- 893d-8cfd8d10a984/mdev_type' because the link haven't been created yet. Then libvirt thinks this devices doesn't exsit and stop tracking info of this device. Finally, command `vish nodedev-list -cap` return nothing though the device exist at host indeed.
To make it easy to reproduce this problem, I just add a `udelay(1)` before point 2 by kprobe.
So I think maybe we should add 'kobject_uevent(..., KOBJ_ADD)' after
On Thu, Feb 01, 2018 at 07:19:45AM +0000, Wuzongyong (Euler Dept) wrote: point 2 instead of point 1 ?
Or is there any other method to solve this problem?
Hi, just a note, we worked around this in upstream libvirt already.
Thanks for the pointer though, I'd also be interested in possible solutions in kernel.
Erik

On Thu, Feb 01, 2018 at 09:08:59AM +0000, Wuzongyong (Euler Dept) wrote:
Hi,
I solve this problem by sending a CHANGE event to userspace after creating the symbol link like sriov driver.
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 1269910..a23f1d7 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -351,6 +351,8 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) goto create_failed; }
+ kobject_uevent(&(mdev->dev.kobj), KOBJ_CHANGE); + mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n");
Is this patch ok?
I'm not really familiar with kernel, but from a layman's point of view, the ADD and CHANGE event should be clearly distinguished, but here you expect the CHANGE event to be the result of creating the sysfs entry, whereas in general you'd expect this from the ADD event, since it's really the object has just been created for the first time, right? Erik
Thanks, Zongyong Wu
-----Original Message----- From: Erik Skultety [mailto:eskultet@redhat.com] Sent: Thursday, February 01, 2018 4:09 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com> Cc: Alex Williamson <alex.williamson@redhat.com>; libvir-list@redhat.com; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt][Question] nodedev-list --cap return nothing after creating a mdev
Hi,
I meet a problem that libvirt handle new mediated devices wrongly. Command `virsh nodedev-list -cap mdev` return none after I create a mdev, and the root cause I found is:
// kernel code int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) { ... ret = device_register(&mdev->dev); // send udev event here --- 1 if (ret) { put_device(&mdev->dev); goto create_err; }
ret = mdev_device_create_ops(kobj, mdev); if (ret) goto create_failed;
ret = mdev_create_sysfs_files(&mdev->dev, type); // create mdev_type symbol link here --- 2 if (ret) { mdev_device_remove_ops(mdev, true); goto create_failed; }
mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); ... }
At point 1, kernel send a udev event(add). If libvirt receive this event and handle it before kernel code reach point 2, libvirt just fail to resolve link such as '/sys/devices/pci0000:80/0000:80:02.0/0000:81:00.0/36b79575-ada8-4406- 893d-8cfd8d10a984/mdev_type' because the link haven't been created yet. Then libvirt thinks this devices doesn't exsit and stop tracking info of this device. Finally, command `vish nodedev-list -cap` return nothing though the device exist at host indeed.
To make it easy to reproduce this problem, I just add a `udelay(1)` before point 2 by kprobe.
So I think maybe we should add 'kobject_uevent(..., KOBJ_ADD)' after
On Thu, Feb 01, 2018 at 07:19:45AM +0000, Wuzongyong (Euler Dept) wrote: point 2 instead of point 1 ?
Or is there any other method to solve this problem?
Hi, just a note, we worked around this in upstream libvirt already.
Thanks for the pointer though, I'd also be interested in possible solutions in kernel.
Erik

a mdev
On Thu, Feb 01, 2018 at 09:08:59AM +0000, Wuzongyong (Euler Dept) wrote:
Hi,
I solve this problem by sending a CHANGE event to userspace after creating the symbol link like sriov driver.
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 1269910..a23f1d7 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -351,6 +351,8 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) goto create_failed; }
+ kobject_uevent(&(mdev->dev.kobj), KOBJ_CHANGE); + mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n");
Is this patch ok?
I'm not really familiar with kernel, but from a layman's point of view, the ADD and CHANGE event should be clearly distinguished, but here you expect the CHANGE event to be the result of creating the sysfs entry, whereas in general you'd expect this from the ADD event, since it's really the object has just been created for the first time, right?
Erik
The ADD event has been sended before sending CHANGE event. And before sending ADD event, the mdev sysfs file node and some general attribute such as power/uevent has been created, but the mdev_type file(link libvirt want to resolve) has not been created yet. So I think maybe we should send a CHANGE event to tell libvirt this link has been created so libvirt can resolve it correctly. By the way, VF is handled like this manner in kernel in case of SR-IOV, you can ref kernel code in driver/pci/iov.c(function pci_iov_add_virtfn). Thanks, Zongyong Wu
Thanks, Zongyong Wu
-----Original Message----- From: Erik Skultety [mailto:eskultet@redhat.com] Sent: Thursday, February 01, 2018 4:09 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com> Cc: Alex Williamson <alex.williamson@redhat.com>; libvir-list@redhat.com; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt][Question] nodedev-list --cap return nothing after creating a mdev
On Thu, Feb 01, 2018 at 07:19:45AM +0000, Wuzongyong (Euler Dept)
wrote:
Hi,
I meet a problem that libvirt handle new mediated devices wrongly. Command `virsh nodedev-list -cap mdev` return none after I create a mdev, and the root cause I found is:
// kernel code int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) { ... ret = device_register(&mdev->dev); // send udev event here --- 1 if (ret) { put_device(&mdev->dev); goto create_err; }
ret = mdev_device_create_ops(kobj, mdev); if (ret) goto create_failed;
ret = mdev_create_sysfs_files(&mdev->dev, type); // create mdev_type symbol link here --- 2 if (ret) { mdev_device_remove_ops(mdev, true); goto create_failed; }
mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n"); ... }
At point 1, kernel send a udev event(add). If libvirt receive this event and handle it before kernel code reach point 2, libvirt just fail to resolve link such as '/sys/devices/pci0000:80/0000:80:02.0/0000:81:00.0/36b79575-ada8-440 6- 893d-8cfd8d10a984/mdev_type' because the link haven't been created yet. Then libvirt thinks this devices doesn't exsit and stop tracking info of this device. Finally, command `vish nodedev-list -cap` return nothing though the device exist at host indeed.
To make it easy to reproduce this problem, I just add a `udelay(1)` before point 2 by kprobe.
So I think maybe we should add 'kobject_uevent(..., KOBJ_ADD)' after point 2 instead of point 1 ? Or is there any other method to solve this problem?
Hi, just a note, we worked around this in upstream libvirt already.
Thanks for the pointer though, I'd also be interested in possible solutions in kernel.
Erik

On Thu, Feb 01, 2018 at 12:03:42PM +0000, Wuzongyong (Euler Dept) wrote:
a mdev
On Thu, Feb 01, 2018 at 09:08:59AM +0000, Wuzongyong (Euler Dept) wrote:
Hi,
I solve this problem by sending a CHANGE event to userspace after creating the symbol link like sriov driver.
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 1269910..a23f1d7 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -351,6 +351,8 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) goto create_failed; }
+ kobject_uevent(&(mdev->dev.kobj), KOBJ_CHANGE); + mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n");
Is this patch ok?
I'm not really familiar with kernel, but from a layman's point of view, the ADD and CHANGE event should be clearly distinguished, but here you expect the CHANGE event to be the result of creating the sysfs entry, whereas in general you'd expect this from the ADD event, since it's really the object has just been created for the first time, right?
Erik
The ADD event has been sended before sending CHANGE event. And before sending ADD event, the mdev sysfs file node and some general attribute such as power/uevent has been created, but the mdev_type file(link libvirt want to resolve) has not been created yet. So I think maybe we should send a CHANGE event to tell libvirt this link has been created so libvirt can resolve it correctly.
Well, as much beneficial as getting an event from udev can be, it wouldn't change anything within libvirt really, since when we get the ADD event, we try to wait for the link to appear and only then continue. With the CHANGE event we'll update the device again. I'm only saying it because we'll be supporting running upstream on kernels prior to this potential change/fix, so it will take "a few" years until libvirt is going to solely rely on udev events here. The commit ID you asked in the other mail is 1af45804. Erik

a mdev
On Thu, Feb 01, 2018 at 09:08:59AM +0000, Wuzongyong (Euler Dept)
wrote:
Hi,
I solve this problem by sending a CHANGE event to userspace after creating the symbol link like sriov driver.
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 1269910..a23f1d7 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -351,6 +351,8 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) goto create_failed; }
+ kobject_uevent(&(mdev->dev.kobj), KOBJ_CHANGE); + mdev->type_kobj = kobj; dev_dbg(&mdev->dev, "MDEV: created\n");
Is this patch ok?
I'm not really familiar with kernel, but from a layman's point of view, the ADD and CHANGE event should be clearly distinguished, but here you expect the CHANGE event to be the result of creating the sysfs entry, whereas in general you'd expect this from the ADD event, since it's really the object has just been created for the first time, right?
Erik
The ADD event has been sended before sending CHANGE event. And before sending ADD event, the mdev sysfs file node and some general attribute such as power/uevent has been created, but the mdev_type file(link libvirt want to resolve) has not been created yet. So I think maybe we should send a CHANGE event to tell libvirt this link has been created so
On Thu, Feb 01, 2018 at 12:03:42PM +0000, Wuzongyong (Euler Dept) wrote: libvirt can resolve it correctly.
Well, as much beneficial as getting an event from udev can be, it wouldn't change anything within libvirt really, since when we get the ADD event, we try to wait for the link to appear and only then continue. With the CHANGE event we'll update the device again. I'm only saying it because we'll be supporting running upstream on kernels prior to this potential change/fix, so it will take "a few" years until libvirt is going to solely rely on udev events here.
The commit ID you asked in the other mail is 1af45804.
Erik
Yes, you are right. I forgot to take the case that kernels prior to this potential change into account. Thanks, Zongyong Wu
participants (2)
-
Erik Skultety
-
Wuzongyong (Euler Dept)