about pcie-hot-plug on aarch64

Hello, We run Openstack Stein on arm. It runs nova-compute(use libvirt as virt driver) on arm host. We found when built with disks (use ceph rbd) on arm hosts, the vm can not attach all disk correctly. For example, built with six disks, the vm may attach three disks. No obvious error can be fond in nova-compute, libvirt. We compare aarch64 and x86, find when detach disk, the dmesg of the vm's os is different. May be the pciehg parameter is different? Did anyone met the problem? Or some suggestions? x86: Nothing at all aarch64: Sep 29 15:28:55 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:28:55 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Powering off due to button press Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Button cancel Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Action canceled due to button press Sep 29 15:29:07 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:29:07 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Powering off due to button press Sep 29 15:29:13 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Link Up Sep 29 15:29:16 * kernel: pciehp 0000:00:01.5:pcie004: Failed to check link status Sep 29 15:29:18 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:18 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Powering off due to button press Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Button cancel Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Action canceled due to button press Sep 29 15:29:30 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:30 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Powering off due to button press Sep 29 15:29:36 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Link Up Sep 29 15:29:39 * kernel: pciehp 0000:00:01.6:pcie004: Failed to check link status Sep 29 15:29:39 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:39 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Powering off due to button press Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Button cancel Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Action canceled due to button press Sep 29 15:29:52 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:52 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Powering off due to button press Sep 29 15:29:58 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Link Up Sep 29 15:30:01 * kernel: pciehp 0000:00:02.0:pcie004: Slot(0-8): Attention button pressed Sep 29 15:30:01 * kernel: pciehp 0000:00:02.0:pcie004: Slot(0-8): Powering off due to button press

Hello? Jaze Lee <jazeltq@gmail.com> 于2021年10月8日周五 下午4:54写道:
Hello, We run Openstack Stein on arm. It runs nova-compute(use libvirt as virt driver) on arm host. We found when built with disks (use ceph rbd) on arm hosts, the vm can not attach all disk correctly. For example, built with six disks, the vm may attach three disks. No obvious error can be fond in nova-compute, libvirt. We compare aarch64 and x86, find when detach disk, the dmesg of the vm's os is different. May be the pciehg parameter is different?
Did anyone met the problem? Or some suggestions?
x86: Nothing at all
aarch64: Sep 29 15:28:55 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:28:55 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Powering off due to button press Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Button cancel Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Action canceled due to button press Sep 29 15:29:07 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:29:07 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Powering off due to button press Sep 29 15:29:13 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Link Up Sep 29 15:29:16 * kernel: pciehp 0000:00:01.5:pcie004: Failed to check link status Sep 29 15:29:18 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:18 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Powering off due to button press Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Button cancel Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Action canceled due to button press Sep 29 15:29:30 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:30 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Powering off due to button press Sep 29 15:29:36 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Link Up Sep 29 15:29:39 * kernel: pciehp 0000:00:01.6:pcie004: Failed to check link status Sep 29 15:29:39 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:39 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Powering off due to button press Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Button cancel Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Action canceled due to button press Sep 29 15:29:52 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:52 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Powering off due to button press Sep 29 15:29:58 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Link Up Sep 29 15:30:01 * kernel: pciehp 0000:00:02.0:pcie004: Slot(0-8): Attention button pressed Sep 29 15:30:01 * kernel: pciehp 0000:00:02.0:pcie004: Slot(0-8): Powering off due to button press
-- 谦谦君子

On Fri, Oct 8, 2021 at 5:16 PM Jaze Lee <jazeltq@gmail.com> wrote:
Hello, We run Openstack Stein on arm. It runs nova-compute(use libvirt as virt driver) on arm host. We found when built with disks (use ceph rbd) on arm hosts, the vm can not attach all disk correctly. For example, built with six disks, the vm may attach three disks. No obvious error can be fond in nova-compute, libvirt. We compare aarch64 and x86, find when detach disk, the dmesg of the vm's os is different. May be the pciehg parameter is different?
Please provide the version of libvirt, qemu, openstack-nova and librbd1.
Did anyone met the problem? Or some suggestions?
x86: Nothing at all
aarch64: Sep 29 15:28:55 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:28:55 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Powering off due to button press Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Button cancel Sep 29 15:29:00 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Action canceled due to button press Sep 29 15:29:07 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Attention button pressed Sep 29 15:29:07 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Powering off due to button press Sep 29 15:29:13 * kernel: pciehp 0000:00:01.5:pcie004: Slot(0-5): Link Up Sep 29 15:29:16 * kernel: pciehp 0000:00:01.5:pcie004: Failed to check link status Sep 29 15:29:18 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:18 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Powering off due to button press Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Button cancel Sep 29 15:29:23 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Action canceled due to button press Sep 29 15:29:30 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Attention button pressed Sep 29 15:29:30 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Powering off due to button press Sep 29 15:29:36 * kernel: pciehp 0000:00:01.6:pcie004: Slot(0-6): Link Up Sep 29 15:29:39 * kernel: pciehp 0000:00:01.6:pcie004: Failed to check link status Sep 29 15:29:39 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:39 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Powering off due to button press Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Button cancel Sep 29 15:29:45 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Action canceled due to button press Sep 29 15:29:52 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Attention button pressed Sep 29 15:29:52 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Powering off due to button press Sep 29 15:29:58 * kernel: pciehp 0000:00:01.7:pcie004: Slot(0-7): Link Up Sep 29 15:30:01 * kernel: pciehp 0000:00:02.0:pcie004: Slot(0-8): Attention button pressed Sep 29 15:30:01 * kernel: pciehp 0000:00:02.0:pcie004: Slot(0-8): Powering off due to button press

On Fri, Oct 08, 2021 at 04:54:37PM +0800, Jaze Lee wrote:
Hello, We run Openstack Stein on arm. It runs nova-compute(use libvirt as virt driver) on arm host. We found when built with disks (use ceph rbd) on arm hosts, the vm can not attach all disk correctly. For example, built with six disks, the vm may attach three disks. No obvious error can be fond in nova-compute, libvirt. We compare aarch64 and x86, find when detach disk, the dmesg of the vm's os is different. May be the pciehg parameter is different?
Did anyone met the problem? Or some suggestions?
I think you might have just ran out of PCI ports available for hotplug. Please try setting https://docs.openstack.org/nova/stein/configuration/config.html#libvirt.num_... to a reasonable value and see whether that helps. Note that, since you're using libvirt through OpenStack and not directly, you're more likely to find someone who's able to help you out if you use the OpenStack support channels. -- Andrea Bolognani / Red Hat / Virtualization
participants (3)
-
Andrea Bolognani
-
Han Han
-
Jaze Lee