On 12/17/2015 10:45 AM, Hailiang Zhang wrote:
Hi all,
We came across a problem, qemu was accidently killed by libvirtd after
we restart libvirtd by service command.
Steps to reproduce:
1) Attach net device to VM by 'virsh attach-device' command
2) Detach the net device that added before by 'virsh detach-device'
command
3) Do the above two steps repeatedly, and at the same time, we restart
libvirtd daemon.
And then we have chance to find that, after libvirtd is restarted at
one time, qemu is killed accidently.
The libvirt log message is:
[2015-12-17 11:34:36]: libvirtd : 216046: info : libvirt version: 1.2.17
[2015-12-17 11:34:36]: libvirtd : 216046: warning :
virDriverLoadModule:65 : Module
/usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not
accessible
[2015-12-17 11:34:36]: libvirtd : 216046: info :
virNetlinkEventServiceStart:662 : starting netlink event service with
protocol 0
[2015-12-17 11:34:36]: libvirtd : 216046: info :
virNetlinkEventServiceStart:712 : set netlink socket big buffer size
[2015-12-17 11:34:36]: libvirtd : 216046: info :
virNetlinkEventServiceStart:662 : starting netlink event service with
protocol 15
[2015-12-17 11:34:36]: libvirtd : 216046: info :
virNetlinkEventServiceStart:712 : set netlink socket big buffer size
[2015-12-17 11:34:36]: libvirtd : 216079: error :
dnsmasqCapsRefreshInternal:741 : Cannot check dnsmasq binary
/usr/sbin/dnsmasq: No such file or directory
[2015-12-17 11:34:36]: libvirtd : 216079: info :
networkReloadFirewallRules:1861 : Reloading iptables rules
[2015-12-17 11:34:36]: libvirtd : 216079: info :
networkRefreshDaemons:1830 : Refreshing network daemons
[2015-12-17 11:34:36]: libvirtd : 216079: error :
virNodeSuspendSupportsTarget:332 : internal error: Cannot probe for
supported suspend types
[2015-12-17 11:34:36]: libvirtd : 216079: warning :
virQEMUCapsInit:1037 : Failed to get host power management capabilities
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23039 : Scanning for configs in
/var/run/libvirt/qemu
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23039 : Scanning for configs in
/etc/libvirt/qemu
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info :
virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml'
[2015-12-17 11:34:36]: libvirtd : 216079: info :
qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName
in /var/lib/libvirt/qemu/snapshot/VMName
[2015-12-17 11:34:36]: libvirtd : 216079: info :
qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName2
in /var/lib/libvirt/qemu/snapshot/VMName2
[2015-12-17 11:34:36]: libvirtd : 216079: info :
qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName3
in /var/lib/libvirt/qemu/snapshot/VMName3
[2015-12-17 11:34:36]: libvirtd : 216047: info :
remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by
client: 127.0.0.1;0
[2015-12-17 11:34:36]: libvirtd : 216050: info :
remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by
client: 127.0.0.1;0
[2015-12-17 11:34:36]: libvirtd : 216047: info :
virDomainDetachDevice:8508 : enter virDomainDetachDevice
domainname=VMName2, xml= <interface type='bridge'>^M
<source bridge='br-1'/>^M
<mac address='00:16:3e:76:85:17'/>^M
<virtualport type='openvswitch' />^M
<model type='virtio'/>^M
</interface>^M
^M
[2015-12-17 11:34:36]: libvirtd : 216050: info :
virDomainDetachDevice:8508 : enter virDomainDetachDevice
domainname=VMName, xml= <interface type='bridge'>^M
<source bridge='br-1'/>^M
<mac address='00:16:3e:76:85:15'/>^M
2015-12-17 11:34:36]: libvirtd : 216095: error :
qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU
command 'netdev_del': Device 'hostnet5' not found
[2015-12-17 11:34:36]: libvirtd : 216096: error :
qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU
command 'netdev_del': Device 'hostnet2' not found
[2015-12-17 11:34:36]: libvirtd : 216097: error :
qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU
command 'netdev_del': Device 'hostnet1' not found
[2015-12-17 11:34:37]: libvirtd : 216095: info :
virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC
user and group on '/data/suse11_sp3_64_1'
[2015-12-17 11:34:37]: libvirtd : 216095: info :
virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on
'/data/suse11_sp3_64_1' to '0:0'
[2015-12-17 11:34:37]: libvirtd : 216096: info :
virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC
user and group on '/data/suse11_sp3_64_2'
[2015-12-17 11:34:37]: libvirtd : 216096: info :
virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on
'/data/suse11_sp3_64_2' to '0:0'
[2015-12-17 11:34:37]: libvirtd : 216095: info : qemuProcessStop:5720
: VM 'VMName' is dying,remove vBMC
[2015-12-17 11:34:37]: libvirtd : 216095: error : rtev_vBMC_send:209 :
Failed to remove domain vBMC bind path for
33ae1cb0-c451-4620-8cee-1bbddd124b82, errno is 2
[2015-12-17 11:34:37]: libvirtd : 216050: info :
remoteDispatchDomainAttachDevice:2855 : Domain is attached a device by
client: 127.0.0.1;0
[2015-12-17 11:34:37]: libvirtd : 216050: info :
virDomainAttachDevice:8390 : enter virDomainAttachDevice
domainname=VMName, xml=<interface type='bridge'>^M
The VM's qemu log is:
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
device_del, qmp_cmd_arguments: {"id": "net4"}
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
is 1 that means ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
is 1 that means ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 402473}, "event":
"DEVICE_DELETED", "data": {"path":
"/machine/peripheral/net4/virtio-backend"}}
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 402586}, "event":
"DEVICE_DELETED", "data": {"device": "net4",
"path":
"/machine/peripheral/net4"}}
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
netdev_del, qmp_cmd_arguments: {"id": "hostnet4"}
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: qom-list,
qmp_cmd_arguments: {"path": "/machine/peripheral"}
---------------------------------------------- At this time, We
restart libvirtd just right
------------------------------------------------------------
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
device_del, qmp_cmd_arguments: {"id": "net5"}
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
is 1 that means ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
is 1 that means ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 810030}, "event":
"DEVICE_DELETED", "data": {"path":
"/machine/peripheral/net5/virtio-backend"}}
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352070, "microseconds": 810150}, "event":
"DEVICE_DELETED", "data": {"device": "net5",
"path":
"/machine/peripheral/net5"}}
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
[2015-12-17 11:34:35] send_control_event:225 virtio serial port 2 send
control message event = 6, value = 0
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name:
qmp_capabilities
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name:
query-commands
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: add-fd,
qmp_cmd_arguments: {"fdset-id": 0, "opaque": "/dev/null"}
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name:
block-commit, qmp_cmd_arguments: {"device": "bogus"}
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-events
[2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send
control message event = 6, value = 1
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-status
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-block
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-chardev
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: qom-list,
qmp_cmd_arguments: {"path": "/machine/peripheral"}
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name:
netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
[2015-12-17 11:34:36]: shutting down
[2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send
control message event = 6, value = 0
qemu: terminating on signal 15 from pid 216046(libvirtd)
[2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352076, "microseconds": 267427}, "event":
"SHUTDOWN"}
[2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp":
{"seconds": 1450352076, "microseconds": 267773}, "event":
"DEVICE_TRAY_MOVED", "data": {"device":
"drive-ide0-0-1", "tray-open":
true}}
We have analyzed this problem, and thought it was triggered by the
inconsistent state for the net device
between qemu and the new libvirtd.
After libvirtd is started, it will connect to qemu, and updates the
devices state by command 'qom-list',
and also it will read the configure information from XML stored.
The codes call process is:
qemuProcessReconnect
->qemuProcessUpdateDevices
->if (qemuDomainUpdateDeviceList(driver, vm,
QEMU_ASYNC_JOB_NONE) < 0)
->if ((tmp = old)) {
while (*tmp) {
if (!virStringArrayHasString(priv->qemuDevices, *tmp) &&
virDomainDefFindDevice(vm->def, *tmp, &dev, false) ==
0 &&
qemuDomainRemoveDevice(driver, vm, &dev) < 0) {
goto cleanup; --->If we fail here, it will return -1;
and it will call
qemuProcessStop() in
qemuProcessReconnect to kill VM.
}
tmp++;
}
}
It is a little hard to reproduce this problem, we can reproduce it in
a special way:
1) Attach a net device by virsh command
2) Stop libvirtd
3) Detach the net device by directly send qmp command to QEMU though
the monitor, just like:
# ./qmp-shell /var/lib/libvirt/qemu/redhat.monitor
Welcome to the QMP low-level shell!
Connected to QEMU 2.3.0
(QEMU) qom-list path=/machine/peripheral
{u'return': [{u'type': u'child<virtio-net-pci>',
u'name': u'net1'},
{u'type': u'child<cirrus-vga>', u'name':
u'video0'}, {u'type':
u'child<usb-tablet>', u'name': u'input0'},
{u'type':
u'child<isa-serial>', u'name': u'serial0'},
{u'type':
u'child<virtio-net-pci>', u'name': u'net0'},
{u'type':
u'child<scsi-hd>', u'name': u'scsi0-0-0-0'},
{u'type':
u'child<virtio-scsi-pci>', u'name': u'scsi0'},
{u'type':
u'child<piix3-usb-uhci>', u'name': u'usb'},
{u'type': u'string',
u'name': u'type'}]}
(QEMU) device_del id=net1
{u'return': {}}
(QEMU) netdev_del id=hostnet1
{u'return': {}}
4) Restart libvirtd daemon and then you will found that qemu is killed
by libvirtd.
The main reason for the inconsistent state is, we accidently restarted
libvirtd before it finished treating
'detach device' work.
It seems a little violent to kill such VM directly.
Can we just do the unfinished cleanup work for the VM after libvirtd
is starting ?
(IMHO, we can't ensure not to restart libvirtd while it is treating
the detach device work.)
Thanks for the detailed debugging and analysis!
While I don't think that libvirt should be required to deal properly
with someone going into the qmp shell and detaching a device while
libvirtd is down, it *should* properly handle the case where libvirtd is
restarted while a device detach is in progress (I had thought that
worked properly, but confess I've never actually tried it).
So that we don't lose track of it, can you file a bug for this in the
libvirt upstream bugzilla:
https://libvirt.org/bugs.html