[PATCH] udevProcessCSS: fix segfault

Don't process subchannel devices where `def->driver` is not set. This fixes the following segfault: Thread 21 "nodedev-init" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3ffb08fc910 (LWP 64303)] (gdb) bt #0 0x000003fffd1272b4 in __strcmp_vx () at /lib64/libc.so.6 #1 0x000003ffc260c3a8 in udevProcessCSS (device=0x3ff9018d130, def=0x3ff90194a90) #2 0x000003ffc260cb78 in udevGetDeviceDetails (device=0x3ff9018d130, def=0x3ff90194a90) #3 0x000003ffc260d126 in udevAddOneDevice (device=0x3ff9018d130) #4 0x000003ffc260d414 in udevProcessDeviceListEntry (udev=0x3ffa810d800, list_entry=0x3ff90001990) #5 0x000003ffc260d638 in udevEnumerateDevices (udev=0x3ffa810d800) #6 0x000003ffc260e08e in nodeStateInitializeEnumerate (opaque=0x3ffa810d800) #7 0x000003fffdaa14b6 in virThreadHelper (data=0x3ffa810df00) #8 0x000003fffc309ed6 in start_thread () #9 0x000003fffd185e66 in thread_start () (gdb) p *def $2 = { name = 0x0, sysfs_path = 0x3ff90198e80 "/sys/devices/css0/0.0.ff40", parent = 0x0, parent_sysfs_path = 0x0, parent_wwnn = 0x0, parent_wwpn = 0x0, parent_fabric_wwn = 0x0, driver = 0x0, devnode = 0x0, devlinks = 0x3ff90194670, caps = 0x3ff90194380 } Fixes: 05e6cdafa6e0 ("node_device: detect CSS devices") Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> --- src/node_device/node_device_udev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 5f2841bb7d8e..12e3f30badd1 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1130,8 +1130,9 @@ udevProcessCSS(struct udev_device *device, virNodeDeviceDefPtr def) { /* only process IO subchannel and vfio-ccw devices to keep the list sane */ - if (STRNEQ(def->driver, "io_subchannel") && - STRNEQ(def->driver, "vfio_ccw")) + if (!def->driver || + (STRNEQ(def->driver, "io_subchannel") && + STRNEQ(def->driver, "vfio_ccw"))) return -1; if (udevGetCCWAddress(def->sysfs_path, &def->caps->data) < 0) -- 2.25.4

On Mon, Sep 21, 2020 at 07:06:32PM +0200, Marc Hartmayer wrote:
Don't process subchannel devices where `def->driver` is not set. This fixes the following segfault:
Thread 21 "nodedev-init" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3ffb08fc910 (LWP 64303)] (gdb) bt #0 0x000003fffd1272b4 in __strcmp_vx () at /lib64/libc.so.6 #1 0x000003ffc260c3a8 in udevProcessCSS (device=0x3ff9018d130, def=0x3ff90194a90) #2 0x000003ffc260cb78 in udevGetDeviceDetails (device=0x3ff9018d130, def=0x3ff90194a90) #3 0x000003ffc260d126 in udevAddOneDevice (device=0x3ff9018d130) #4 0x000003ffc260d414 in udevProcessDeviceListEntry (udev=0x3ffa810d800, list_entry=0x3ff90001990) #5 0x000003ffc260d638 in udevEnumerateDevices (udev=0x3ffa810d800) #6 0x000003ffc260e08e in nodeStateInitializeEnumerate (opaque=0x3ffa810d800) #7 0x000003fffdaa14b6 in virThreadHelper (data=0x3ffa810df00) #8 0x000003fffc309ed6 in start_thread () #9 0x000003fffd185e66 in thread_start () (gdb) p *def $2 = { name = 0x0, sysfs_path = 0x3ff90198e80 "/sys/devices/css0/0.0.ff40",
Okay, this patch fixes the segfault. However, if ^this generated it because the driver name is not set, how do we even get to the resulting device tree as outlined in 05e6cdafa6e0? +- css_0_0_003a | +- ccw_0_0_1a2b | +- scsi_host0 What kind of CSS device is the one causing the error? If we skip this CSS device, we don't generate a name for it and don't put it in the list, so I'm quite puzzled on what I missed in the IBM document and thus in the review process. FWIW: Reviewed-by: Erik Skultety <eskultet@redhat.com>

On 9/22/20 8:26 AM, Erik Skultety wrote:
On Mon, Sep 21, 2020 at 07:06:32PM +0200, Marc Hartmayer wrote:
Don't process subchannel devices where `def->driver` is not set. This fixes the following segfault:
Thread 21 "nodedev-init" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3ffb08fc910 (LWP 64303)] (gdb) bt #0 0x000003fffd1272b4 in __strcmp_vx () at /lib64/libc.so.6 #1 0x000003ffc260c3a8 in udevProcessCSS (device=0x3ff9018d130, def=0x3ff90194a90) #2 0x000003ffc260cb78 in udevGetDeviceDetails (device=0x3ff9018d130, def=0x3ff90194a90) #3 0x000003ffc260d126 in udevAddOneDevice (device=0x3ff9018d130) #4 0x000003ffc260d414 in udevProcessDeviceListEntry (udev=0x3ffa810d800, list_entry=0x3ff90001990) #5 0x000003ffc260d638 in udevEnumerateDevices (udev=0x3ffa810d800) #6 0x000003ffc260e08e in nodeStateInitializeEnumerate (opaque=0x3ffa810d800) #7 0x000003fffdaa14b6 in virThreadHelper (data=0x3ffa810df00) #8 0x000003fffc309ed6 in start_thread () #9 0x000003fffd185e66 in thread_start () (gdb) p *def $2 = { name = 0x0, sysfs_path = 0x3ff90198e80 "/sys/devices/css0/0.0.ff40",
Okay, this patch fixes the segfault. However, if ^this generated it because the driver name is not set, how do we even get to the resulting device tree as outlined in 05e6cdafa6e0?
+- css_0_0_003a | +- ccw_0_0_1a2b | +- scsi_host0
What kind of CSS device is the one causing the error? If we skip this CSS device, we don't generate a name for it and don't put it in the list, so I'm quite puzzled on what I missed in the IBM document and thus in the review process.
FWIW: Reviewed-by: Erik Skultety <eskultet@redhat.com>
Erik, for whatever reason Marcs system does not have the subchannel device driver "chsc_subchannel" loaded. Therefore the subchannel is not bound to any driver and by the way this would be a filtered device since we want to show the users only io_subchannel and vfio_ccw subchannels. The tree above shows a subchannel bound to the io_subchannel device driver. Maybe it helps you to see a full example of how it looks on my system: # virsh nodedev-list --tree computer | +- css_0_0_000e | | | +- ccw_0_0_f500 | +- css_0_0_000f | | | +- ccw_0_0_f501 | +- css_0_0_0010 | | | +- ccw_0_0_f502 | +- css_0_0_0011 | | | +- ccw_0_0_bd00 | +- css_0_0_0012 | | | +- ccw_0_0_bd01 | +- css_0_0_0013 | | | +- ccw_0_0_bd02 | +- css_0_0_002f | | | +- ccw_0_0_19c0 | | | +- scsi_host3 | | | +- scsi_target3_0_1 | | | +- scsi_3_0_1_1078935649 | | | +- block_sda_36005076307ffc5e3000000000000614f | +- scsi_generic_sg0 | +- css_0_0_0038 | | | +- ccw_0_0_1900 | | | +- scsi_host1 | | | +- scsi_target1_0_0 | | | | | +- scsi_1_0_0_1075986530 | | | | | | | +- block_sdd_36005076307ffc5e30000000000006222 | | | +- scsi_generic_sg3 | | | | | +- scsi_1_0_0_1078935649 | | | | | +- block_sdc_36005076307ffc5e3000000000000614f | | +- scsi_generic_sg2 | | | +- scsi_target1_0_2 | | | +- scsi_1_0_2_1075986530 | | | | | +- block_sdf_36005076307ffc5e30000000000006222 | | +- scsi_generic_sg5 | | | +- scsi_1_0_2_1078935649 | | | +- block_sde_36005076307ffc5e3000000000000614f | +- scsi_generic_sg4 | +- css_0_0_003a | | | +- ccw_0_0_1940 | | | +- scsi_host0 | | | +- scsi_target0_0_1 | | | | | +- scsi_0_0_1_1075986530 | | | | | | | +- block_sdi_36005076307ffc5e30000000000006222 | | | +- scsi_generic_sg8 | | | | | +- scsi_0_0_1_1078935649 | | | | | +- block_sdh_36005076307ffc5e3000000000000614f | | +- scsi_generic_sg7 | | | +- scsi_target0_0_3 | | | +- scsi_0_0_3_1075986530 | | | | | +- block_sdg_36005076307ffc5e30000000000006222 | | +- scsi_generic_sg6 | | | +- scsi_0_0_3_1078935649 | | | +- block_sdb_36005076307ffc5e3000000000000614f | +- scsi_generic_sg1 | +- css_0_0_003c | | | +- ccw_0_0_1980 | | | +- scsi_host2 | +- css_0_0_006b | | | +- ccw_0_0_1000 | | | +- block_dasdg_IBM_750000000DHVL1_0001_00 | +- css_0_0_006c | | | +- ccw_0_0_1001 | | | +- block_dasda_IBM_750000000DHVL1_0001_01 | +- css_0_0_006d | | | +- ccw_0_0_1002 | | | +- block_dasdb_IBM_750000000DHVL1_0001_02 | +- css_0_0_006e | | | +- ccw_0_0_1003 | | | +- block_dasdc_IBM_750000000DHVL1_0001_03 | +- css_0_0_006f | | | +- ccw_0_0_1004 | | | +- block_dasdd_IBM_750000000DHVL1_0001_04 | +- css_0_0_0070 | | | +- ccw_0_0_1005 | | | +- block_dasdh_IBM_750000000DHVL1_0001_05 | +- css_0_0_0071 | | | +- ccw_0_0_1006 | | | +- block_dasdf_IBM_750000000DHVL1_0001_06 | +- css_0_0_0072 | | | +- mdev_2e7237a7_0445_407e_b880_96f63a3cd17d | +- net_encbd00_02_ff_e3_80_c4_ef +- net_encf500_02_ff_e3_80_76_89 +- net_lo_00_00_00_00_00_00 +- pci_0001_00_00_0 | +- net_enP1s8_82_01_2d_0c_bb_b0 +- net_enP1s8d1_82_01_2d_0c_bb_b1 All subchannels but css_0_0_0072 are bound to io_subchannel. css_0_0_0072 is bound to vfio-ccw and therefore lists an mdev as child. Here are the xml dumps: # virsh nodedev-dumpxml css_0_0_0071 <device> <name>css_0_0_0071</name> <path>/sys/devices/css0/0.0.0071</path> <parent>computer</parent> <driver> <name>io_subchannel</name> </driver> <capability type='css'> <cssid>0x0</cssid> <ssid>0x0</ssid> <devno>0x0071</devno> </capability> </device> # virsh nodedev-dumpxml css_0_0_0072 <device> <name>css_0_0_0072</name> <path>/sys/devices/css0/0.0.0072</path> <parent>computer</parent> <driver> <name>vfio_ccw</name> </driver> <capability type='css'> <cssid>0x0</cssid> <ssid>0x0</ssid> <devno>0x0072</devno> </capability> </device> I hope that helps a bit... -- Mit freundlichen Grüßen/Kind regards Boris Fiuczynski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On Tue, Sep 22, 2020 at 10:45:09AM +0200, Boris Fiuczynski wrote:
On 9/22/20 8:26 AM, Erik Skultety wrote:
On Mon, Sep 21, 2020 at 07:06:32PM +0200, Marc Hartmayer wrote:
Don't process subchannel devices where `def->driver` is not set. This fixes the following segfault:
Thread 21 "nodedev-init" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3ffb08fc910 (LWP 64303)] (gdb) bt #0 0x000003fffd1272b4 in __strcmp_vx () at /lib64/libc.so.6 #1 0x000003ffc260c3a8 in udevProcessCSS (device=0x3ff9018d130, def=0x3ff90194a90) #2 0x000003ffc260cb78 in udevGetDeviceDetails (device=0x3ff9018d130, def=0x3ff90194a90) #3 0x000003ffc260d126 in udevAddOneDevice (device=0x3ff9018d130) #4 0x000003ffc260d414 in udevProcessDeviceListEntry (udev=0x3ffa810d800, list_entry=0x3ff90001990) #5 0x000003ffc260d638 in udevEnumerateDevices (udev=0x3ffa810d800) #6 0x000003ffc260e08e in nodeStateInitializeEnumerate (opaque=0x3ffa810d800) #7 0x000003fffdaa14b6 in virThreadHelper (data=0x3ffa810df00) #8 0x000003fffc309ed6 in start_thread () #9 0x000003fffd185e66 in thread_start () (gdb) p *def $2 = { name = 0x0, sysfs_path = 0x3ff90198e80 "/sys/devices/css0/0.0.ff40",
Okay, this patch fixes the segfault. However, if ^this generated it because the driver name is not set, how do we even get to the resulting device tree as outlined in 05e6cdafa6e0?
+- css_0_0_003a | +- ccw_0_0_1a2b | +- scsi_host0
What kind of CSS device is the one causing the error? If we skip this CSS device, we don't generate a name for it and don't put it in the list, so I'm quite puzzled on what I missed in the IBM document and thus in the review process.
FWIW: Reviewed-by: Erik Skultety <eskultet@redhat.com>
Erik, for whatever reason Marcs system does not have the subchannel device driver "chsc_subchannel" loaded. Therefore the subchannel is not bound to any driver and by the way this would be a filtered device since we want to show the users only io_subchannel and vfio_ccw subchannels. The tree above shows a subchannel bound to the io_subchannel device driver. Maybe it helps you to see a full example of how it looks on my system:
Oh, it would have been a different driver anyway - impossible to spot just from the address :). Yeah, the tree dump makes it much clearer, thanks, I pushed the patch. Erik
participants (3)
-
Boris Fiuczynski
-
Erik Skultety
-
Marc Hartmayer