Hi all,
We came across a problem, qemu was accidently killed by libvirtd after we restart libvirtd
by service command.
Steps to reproduce:
1) Attach net device to VM by 'virsh attach-device' command
2) Detach the net device that added before by 'virsh detach-device' command
3) Do the above two steps repeatedly, and at the same time, we restart libvirtd daemon.
And then we have chance to find that, after libvirtd is restarted at one time, qemu is
killed accidently.
The libvirt log message is:
[2015-12-17 11:34:36]: libvirtd : 216046: info : libvirt version: 1.2.17
[2015-12-17 11:34:36]: libvirtd : 216046: warning : virDriverLoadModule:65 : Module
/usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not accessible
[2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:662 :
starting netlink event service with protocol 0
[2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:712 : set
netlink socket big buffer size
[2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:662 :
starting netlink event service with protocol 15
[2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:712 : set
netlink socket big buffer size
[2015-12-17 11:34:36]: libvirtd : 216079: error : dnsmasqCapsRefreshInternal:741 : Cannot
check dnsmasq binary /usr/sbin/dnsmasq: No such file or directory
[2015-12-17 11:34:36]: libvirtd : 216079: info : networkReloadFirewallRules:1861 :
Reloading iptables rules
[2015-12-17 11:34:36]: libvirtd : 216079: info : networkRefreshDaemons:1830 : Refreshing
network daemons
[2015-12-17 11:34:36]: libvirtd : 216079: error : virNodeSuspendSupportsTarget:332 :
internal error: Cannot probe for supported suspend types
[2015-12-17 11:34:36]: libvirtd : 216079: warning : virQEMUCapsInit:1037 : Failed to get
host power management capabilities
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23039 :
Scanning for configs in /var/run/libvirt/qemu
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 :
Loading config file 'VMName.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 :
Loading config file 'VMName2.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 :
Loading config file 'VMName3.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23039 :
Scanning for configs in /etc/libvirt/qemu
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 :
Loading config file 'VMName.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 :
Loading config file 'VMName2.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 :
Loading config file 'VMName3.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info : qemuDomainSnapshotLoad:490 : Scanning for
snapshots for domain VMName in /var/lib/libvirt/qemu/snapshot/VMName
[2015-12-17 11:34:36]: libvirtd : 216079: info : qemuDomainSnapshotLoad:490 : Scanning for
snapshots for domain VMName2 in /var/lib/libvirt/qemu/snapshot/VMName2
[2015-12-17 11:34:36]: libvirtd : 216079: info : qemuDomainSnapshotLoad:490 : Scanning for
snapshots for domain VMName3 in /var/lib/libvirt/qemu/snapshot/VMName3
[2015-12-17 11:34:36]: libvirtd : 216047: info : remoteDispatchDomainDetachDevice:4105 :
Domain is detached a device by client: 127.0.0.1;0
[2015-12-17 11:34:36]: libvirtd : 216050: info : remoteDispatchDomainDetachDevice:4105 :
Domain is detached a device by client: 127.0.0.1;0
[2015-12-17 11:34:36]: libvirtd : 216047: info : virDomainDetachDevice:8508 : enter
virDomainDetachDevice domainname=VMName2, xml= <interface type='bridge'>^M
<source bridge='br-1'/>^M
<mac address='00:16:3e:76:85:17'/>^M
<virtualport type='openvswitch' />^M
<model type='virtio'/>^M
</interface>^M
^M
[2015-12-17 11:34:36]: libvirtd : 216050: info : virDomainDetachDevice:8508 : enter
virDomainDetachDevice domainname=VMName, xml= <interface type='bridge'>^M
<source bridge='br-1'/>^M
<mac address='00:16:3e:76:85:15'/>^M
2015-12-17 11:34:36]: libvirtd : 216095: error : qemuMonitorJSONCheckError:380 : internal
error: unable to execute QEMU command 'netdev_del': Device 'hostnet5' not
found
[2015-12-17 11:34:36]: libvirtd : 216096: error : qemuMonitorJSONCheckError:380 : internal
error: unable to execute QEMU command 'netdev_del': Device 'hostnet2' not
found
[2015-12-17 11:34:36]: libvirtd : 216097: error : qemuMonitorJSONCheckError:380 : internal
error: unable to execute QEMU command 'netdev_del': Device 'hostnet1' not
found
[2015-12-17 11:34:37]: libvirtd : 216095: info :
virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC user and group on
'/data/suse11_sp3_64_1'
[2015-12-17 11:34:37]: libvirtd : 216095: info : virSecurityDACSetOwnershipInternal:248 :
Setting DAC user and group on '/data/suse11_sp3_64_1' to '0:0'
[2015-12-17 11:34:37]: libvirtd : 216096: info :
virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC user and group on
'/data/suse11_sp3_64_2'
[2015-12-17 11:34:37]: libvirtd : 216096: info : virSecurityDACSetOwnershipInternal:248 :
Setting DAC user and group on '/data/suse11_sp3_64_2' to '0:0'
[2015-12-17 11:34:37]: libvirtd : 216095: info : qemuProcessStop:5720 : VM
'VMName' is dying,remove vBMC
[2015-12-17 11:34:37]: libvirtd : 216095: error : rtev_vBMC_send:209 : Failed to remove
domain vBMC bind path for 33ae1cb0-c451-4620-8cee-1bbddd124b82, errno is 2
[2015-12-17 11:34:37]: libvirtd : 216050: info : remoteDispatchDomainAttachDevice:2855 :
Domain is attached a device by client: 127.0.0.1;0
[2015-12-17 11:34:37]: libvirtd : 216050: info : virDomainAttachDevice:8390 : enter
virDomainAttachDevice domainname=VMName, xml=<interface type='bridge'>^M
The VM's qemu log is:
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: device_del, qmp_cmd_arguments:
{"id": "net4"}
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means
ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means
ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 402473}, "event":
"DEVICE_DELETED", "data": {"path":
"/machine/peripheral/net4/virtio-backend"}}
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 402586}, "event":
"DEVICE_DELETED", "data": {"device": "net4",
"path": "/machine/peripheral/net4"}}
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: netdev_del, qmp_cmd_arguments:
{"id": "hostnet4"}
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: qom-list, qmp_cmd_arguments:
{"path": "/machine/peripheral"}
---------------------------------------------- At this time, We restart libvirtd just
right ------------------------------------------------------------
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: device_del, qmp_cmd_arguments:
{"id": "net5"}
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means
ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means
ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 810030}, "event":
"DEVICE_DELETED", "data": {"path":
"/machine/peripheral/net5/virtio-backend"}}
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 810150}, "event":
"DEVICE_DELETED", "data": {"device": "net5",
"path": "/machine/peripheral/net5"}}
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: netdev_del, qmp_cmd_arguments:
{"id": "hostnet5"}
[2015-12-17 11:34:35] send_control_event:225 virtio serial port 2 send control message
event = 6, value = 0
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: qmp_capabilities
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-commands
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: add-fd, qmp_cmd_arguments:
{"fdset-id": 0, "opaque": "/dev/null"}
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: block-commit,
qmp_cmd_arguments: {"device": "bogus"}
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-events
[2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send control message
event = 6, value = 1
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-status
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-block
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-chardev
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: qom-list, qmp_cmd_arguments:
{"path": "/machine/peripheral"}
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: netdev_del, qmp_cmd_arguments:
{"id": "hostnet5"}
[2015-12-17 11:34:36]: shutting down
[2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send control message
event = 6, value = 0
qemu: terminating on signal 15 from pid 216046(libvirtd)
[2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352076, "microseconds": 267427}, "event":
"SHUTDOWN"}
[2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352076, "microseconds": 267773}, "event":
"DEVICE_TRAY_MOVED", "data": {"device":
"drive-ide0-0-1", "tray-open": true}}
We have analyzed this problem, and thought it was triggered by the inconsistent state for
the net device
between qemu and the new libvirtd.
After libvirtd is started, it will connect to qemu, and updates the devices state by
command 'qom-list',
and also it will read the configure information from XML stored.
The codes call process is:
qemuProcessReconnect
->qemuProcessUpdateDevices
->if (qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE) < 0)
->if ((tmp = old)) {
while (*tmp) {
if (!virStringArrayHasString(priv->qemuDevices, *tmp) &&
virDomainDefFindDevice(vm->def, *tmp, &dev, false) == 0
&&
qemuDomainRemoveDevice(driver, vm, &dev) < 0) {
goto cleanup; --->If we fail here, it will return -1; and it will
call
qemuProcessStop() in qemuProcessReconnect to kill VM.
}
tmp++;
}
}
It is a little hard to reproduce this problem, we can reproduce it in a special way:
1) Attach a net device by virsh command
2) Stop libvirtd
3) Detach the net device by directly send qmp command to QEMU though the monitor, just
like:
# ./qmp-shell /var/lib/libvirt/qemu/redhat.monitor
Welcome to the QMP low-level shell!
Connected to QEMU 2.3.0
(QEMU) qom-list path=/machine/peripheral
{u'return': [{u'type': u'child<virtio-net-pci>',
u'name': u'net1'}, {u'type': u'child<cirrus-vga>',
u'name': u'video0'}, {u'type': u'child<usb-tablet>',
u'name': u'input0'}, {u'type': u'child<isa-serial>',
u'name': u'serial0'}, {u'type':
u'child<virtio-net-pci>', u'name': u'net0'},
{u'type': u'child<scsi-hd>', u'name':
u'scsi0-0-0-0'}, {u'type': u'child<virtio-scsi-pci>',
u'name': u'scsi0'}, {u'type':
u'child<piix3-usb-uhci>', u'name': u'usb'},
{u'type': u'string', u'name': u'type'}]}
(QEMU) device_del id=net1
{u'return': {}}
(QEMU) netdev_del id=hostnet1
{u'return': {}}
4) Restart libvirtd daemon and then you will found that qemu is killed by libvirtd.
The main reason for the inconsistent state is, we accidently restarted libvirtd before it
finished treating
'detach device' work.
It seems a little violent to kill such VM directly.
Can we just do the unfinished cleanup work for the VM after libvirtd is starting ?
(IMHO, we can't ensure not to restart libvirtd while it is treating the detach device
work.)
Thanks.
zhanghailiang