[libvirt-users] reboot problem with libxl

If I reboot a single vm through libvirt/libxl the system reboots normally. If I have several vm's reboot at the same time then The systems go into a paused state and do not reboot. I then have to kill them via xl and restart them. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

On Thu, Oct 30, 2014 at 01:00:04PM -0400, Alvin Starr wrote:
If I reboot a single vm through libvirt/libxl the system reboots normally. If I have several vm's reboot at the same time then The systems go into a paused state and do not reboot. I then have to kill them via xl and restart them.
Do logs [1] uncover something? Martin [1] http://libvirt.org/logging.html

I was sort of hoping that is was something simple like setting the "do_the_right_thing" flag. The libvirtd kicks out 2014-10-31 11:58:57.111+0000: 8741: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 11:59:29.379+0000: 8840: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:02:03.419+0000: 14712: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:02:20.547+0000: 14712: error : virNetlinkEventCallback:343 : nl_recv returned with error: No buffer space available 2014-10-31 12:02:21.873+0000: 17428: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:03:06.721+0000: 17428: error : virNetlinkEventCallback:343 : nl_recv returned with error: No buffer space available (I deleted the other errors caused by trying to load drivers that don't exits). I reboot 3 systems mirantis_[457] /var/log/libxl/* kicks out. mirantis_4.log:libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model already exited mirantis_5.log:libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model already exited These are more interesting. I wonder if libxl has a race condition. On 10/31/2014 02:58 AM, Martin Kletzander wrote:
On Thu, Oct 30, 2014 at 01:00:04PM -0400, Alvin Starr wrote:
If I reboot a single vm through libvirt/libxl the system reboots normally. If I have several vm's reboot at the same time then The systems go into a paused state and do not reboot. I then have to kill them via xl and restart them.
Do logs [1] uncover something?
Martin
-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

On Fri, Oct 31, 2014 at 08:34:48AM -0400, Alvin Starr wrote:
I was sort of hoping that is was something simple like setting the "do_the_right_thing" flag.
The libvirtd kicks out 2014-10-31 11:58:57.111+0000: 8741: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 11:59:29.379+0000: 8840: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:02:03.419+0000: 14712: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:02:20.547+0000: 14712: error : virNetlinkEventCallback:343 : nl_recv returned with error: No buffer space available 2014-10-31 12:02:21.873+0000: 17428: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:03:06.721+0000: 17428: error : virNetlinkEventCallback:343 : nl_recv returned with error: No buffer space available
(I deleted the other errors caused by trying to load drivers that don't exits).
I reboot 3 systems mirantis_[457]
/var/log/libxl/* kicks out. mirantis_4.log:libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model already exited mirantis_5.log:libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model already exited
These are more interesting.
I wonder if libxl has a race condition.
I'm completely unaware of libxl works, so I can only guess. But this looks like an error in libxl. Maybe someone else has an idea? Martin
On 10/31/2014 02:58 AM, Martin Kletzander wrote:
On Thu, Oct 30, 2014 at 01:00:04PM -0400, Alvin Starr wrote:
If I reboot a single vm through libvirt/libxl the system reboots normally. If I have several vm's reboot at the same time then The systems go into a paused state and do not reboot. I then have to kill them via xl and restart them.
Do logs [1] uncover something?
Martin
-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

I am not sure if this is a libvirt or libxl problem. It looks as if each running vm has an associated event thread and this thread calls the libxl_destroy to clean up the rebooting processes. Possibly these threads should lock to insure synchronization and allow only one reboot or termination at a time. I will try to add some diagnostics and run a few more tests. On 10/31/2014 09:28 AM, Martin Kletzander wrote:
On Fri, Oct 31, 2014 at 08:34:48AM -0400, Alvin Starr wrote:
I was sort of hoping that is was something simple like setting the "do_the_right_thing" flag.
The libvirtd kicks out 2014-10-31 11:58:57.111+0000: 8741: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 11:59:29.379+0000: 8840: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:02:03.419+0000: 14712: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:02:20.547+0000: 14712: error : virNetlinkEventCallback:343 : nl_recv returned with error: No buffer space available 2014-10-31 12:02:21.873+0000: 17428: error : virRegisterNetworkDriver:549 : driver in virRegisterNetworkDriver must not be NULL 2014-10-31 12:03:06.721+0000: 17428: error : virNetlinkEventCallback:343 : nl_recv returned with error: No buffer space available
(I deleted the other errors caused by trying to load drivers that don't exits).
I reboot 3 systems mirantis_[457]
/var/log/libxl/* kicks out. mirantis_4.log:libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model already exited mirantis_5.log:libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model already exited
These are more interesting.
I wonder if libxl has a race condition.
I'm completely unaware of libxl works, so I can only guess. But this looks like an error in libxl. Maybe someone else has an idea?
Martin
On 10/31/2014 02:58 AM, Martin Kletzander wrote:
On Thu, Oct 30, 2014 at 01:00:04PM -0400, Alvin Starr wrote:
If I reboot a single vm through libvirt/libxl the system reboots normally. If I have several vm's reboot at the same time then The systems go into a paused state and do not reboot. I then have to kill them via xl and restart them.
Do logs [1] uncover something?
Martin
-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||
-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||
participants (2)
-
Alvin Starr
-
Martin Kletzander